From pushplata.singh at teri.res.in Sun Mar 2 23:29:37 2014 From: pushplata.singh at teri.res.in (Pushplata Singh) Date: Mon, 3 Mar 2014 10:59:37 +0530 Subject: [maker-devel] Query on Hardware requirement Message-ID: Hi, I am trying to assemble and analyse(bio-informatics) genome sequence of a 35 GB fungal genome. The raw data that has been generated from Illumina sequencing is of ~15 GB. Could you please suggest me the system (hardware) requirement for installing and running Maker and ALLPATHS-LG sofrware for the job? Thank you Pushplata Singh, PhD Nanobiotechnology Centre Biotechnology and Management of Bioresources Division The Energy and Resources Institute Darbari Seth Block , India Habitat Centre,Lodhi Road New Delhi 110003 India Phone +91 11 24682100 ext 2611 Fax +91 11 24682145 ------------------------------------------------------------------------------------------------------------ Disclaimer: The information contained in this e-mail is intended for the person or entity to which it is addressed, and it may contain confidential and/or privileged material. Any review or other use of this mail or taking any action based on it by persons or entities other than the intended recipient is strictly prohibited. If you receive this e-mail by mistake, please contact the sender, and delete all copies of this mail.This e-mail has been scanned and verified by McAfee SaaS Email Security, formerly MX Logic. From dence at genetics.utah.edu Mon Mar 3 08:11:34 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 3 Mar 2014 14:11:34 +0000 Subject: [maker-devel] Query on Hardware requirement In-Reply-To: References: Message-ID: Hi Pradeep, I think Allpaths is developed by the Broad Institute, so you'd have to check their documentation for their system requirments. MAKER is installable on Linux and Mac OS X computers. The throughput you'll be able to achieve with MAKER depends on how many processors and how much RAM the machine has. To take advantage of MAKER's ability to parallelize the annotation process, you need some version of MPI installed on your machine. MAKER can try to install MPI for you, but a manual installation is usually required. I hope that helps. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Pushplata Singh [pushplata.singh at teri.res.in] Sent: Sunday, March 02, 2014 10:29 PM To: maker-devel at yandell-lab.org Cc: Pradeep Dahiya Subject: [maker-devel] Query on Hardware requirement Hi, I am trying to assemble and analyse(bio-informatics) genome sequence of a 35 GB fungal genome. The raw data that has been generated from Illumina sequencing is of ~15 GB. Could you please suggest me the system (hardware) requirement for installing and running Maker and ALLPATHS-LG sofrware for the job? Thank you Pushplata Singh, PhD Nanobiotechnology Centre Biotechnology and Management of Bioresources Division The Energy and Resources Institute Darbari Seth Block , India Habitat Centre,Lodhi Road New Delhi 110003 India Phone +91 11 24682100 ext 2611 Fax +91 11 24682145 ------------------------------------------------------------------------------------------------------------ Disclaimer: The information contained in this e-mail is intended for the person or entity to which it is addressed, and it may contain confidential and/or privileged material. Any review or other use of this mail or taking any action based on it by persons or entities other than the intended recipient is strictly prohibited. If you receive this e-mail by mistake, please contact the sender, and delete all copies of this mail.This e-mail has been scanned and verified by McAfee SaaS Email Security, formerly MX Logic. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Mon Mar 3 13:08:49 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 3 Mar 2014 19:08:49 +0000 Subject: [maker-devel] FW: error runinig agustus In-Reply-To: References: Message-ID: Forwarding this to the maker-devel list. On 3/3/14, 12:04 PM, "Borhan, Hossein" wrote: >I encountered the following error while running maker (2nd annotation >using gff file of the first maker run and trinity assembled RNA seq as >EST) > >ERROR: Augustus failed >--> rank=NA, hostname=rapa.agr.gc.ca > >Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I >am using 2.31 > >Your help is appreciated > > >HB > > > > > From carsonhh at gmail.com Mon Mar 3 13:11:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 03 Mar 2014 12:11:08 -0700 Subject: [maker-devel] FW: error runinig agustus Message-ID: You will need to provide more detail. Probably the entire error log and the maker control files. Thanks, Carson On 3/3/14, 12:08 PM, "Carson Holt" wrote: >Forwarding this to the maker-devel list. > > >On 3/3/14, 12:04 PM, "Borhan, Hossein" wrote: > >>I encountered the following error while running maker (2nd annotation >>using gff file of the first maker run and trinity assembled RNA seq as >>EST) >> >>ERROR: Augustus failed >>--> rank=NA, hostname=rapa.agr.gc.ca >> >>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I >>am using 2.31 >> >>Your help is appreciated >> >> >>HB >> >> >> >> >> > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sjackman at gmail.com Tue Mar 4 20:10:42 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 4 Mar 2014 18:10:42 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and *three_prime_UTR*features. Ideally the feature type would be set to *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: > Set single_exon=1, and the minimum size to a smaller value. I think it's > set to 250 right now. Also est2genome is looking for ORF, so if there is > none (as with tRNAs) they probably won't get picked up. > > --Carson > > Sent from my iPhone > > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > > Sorry, ignore my previous question. est_forward also carries forward the > names of protein evidence and works like a charm. Thank you! > > The larger rrn16 and rrn23 genes annotated perfectly, but the smaller > rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They > are in the blastn output, and in the evidence_0.gff. rrn5 has perfect > identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value > (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing > these hits? > > organism_type=prokaryotic > est2genome=1 > protein2genome=1 > est_forward=1 > > Cheers, > Shaun > > > On 27 February 2014 15:17, Shaun Jackman wrote: > >> Is there a corresponding protein_forward=1 option to map forward protein >> names from protein2genome? >> >> Cheers, >> Shaun >> >> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >> wrote: >> >> Sorry I meant to say prefilter on the score in the mRNA column before >> passing the gff3 to model_gff. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >> >> What you can do is run it once with just est_forward=1 and >> est2genome/protein2genome set to 1. Then take those results, pass them in >> as model_gff and use the map_forward option to then filter the results >> based on mRNA score and that would copy names onto new gene under the >> standard MAKER pipeline. Eventually it?s really supposed to go into a >> separate tool that will map genes onto new assemblies (but under the hood >> the tool will just be calling MAKER with certain parameters restricted). I >> do this because if people commonly use it mixed with things like SNAP I can >> start to get some very weird behaviors. >> >> Thanks, >> Carson >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 3:04 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> It seems that this could be a very useful option in those cases where >> you have firm a priori knowledge of the placement of ESTs. However, while >> trying it I note that est_forward implies that the est2genome predictor is >> turned on, implicitly. Is this necessary for this to work? I?m after the >> behavior you describe below where exonerate is made to try really hard >> within a limited region to align an est, but I would not like maker to >> produce est2genome predictions. >> >> In general, I think this maker_coor and est_forward is a feature set that >> is worthy to be promoted into a documented feature. >> >> THanks, >> Mikael >> >> 26 feb 2014 kl. 17:09 skrev Carson Holt : >> >> It will still work without est_forward. It just works a little >> differently. Keep in mind this was a hidden feature I used to find >> stubborn or hard to find missing genes after reassembly of a genome. >> >> If est_forward is provided, MAKER will parse the database to look for the >> maker_coor tags early in the pipeline. Then it will create a list of >> locations to search, and it will search them even if there are no BLAST >> results to seed the search (normally MAKER gets a BLAST result first and >> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >> look for a match using all of chr1 as the input to exonerate even when >> BLAST finds nothing (this is a very very slow search, but can help pick up >> one or two stubborn genes that don?t remap well). To allow this, MAKER >> gives exonerate looser matching parameters (i.e. allows for single base >> pair introns perhaps caused by assembly errors). The logic here is that >> given the fact that I already told MAKER that with some degree of >> confidence I expect sequence A to map to to location X, it will try its >> hardest to make it match. >> >> Without est_forward set, the maker_coor= flag still gets read in GI.pm at >> line 1563, but only after a BLAST alignment has already seeded it to the >> region (that BLAST result has the information in its description >> parameter). MAKER will then ignore seeds completely outside of maker_coor. >> In addition any BLAST seeds that overlap maker_coor will get the search >> space for alignment polishing adjusted to match maker_coor exactly. Also >> match parameters for exonerate will not be relaxed as they were with >> est_forward. >> >> As you can see the behavior, is slightly different (because it?s an >> accidental feature). >> >> Thanks, >> Carson >> >> >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 6:37 AM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> That might be a useful and time saving accidental feature. But, reading >> the code, it seems that I need to supply maker_coor but not gene_id, as >> well as the configuration option est_forward for this to work. Any >> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >> right? >> >> Mikael >> >> 26 feb 2014 kl. 14:22 skrev Carson Holt : >> >> Yes. That should work as well as an accidental feature. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >> mikael.durling at slu.se> wrote: >> >> Can this use of maker_coor be used only to hint about the placement of >> the ests, without affecting the naming of the final genes? Ie if I have a >> database of EST where I have a priori knowledge of their rough placement, >> can this placement be given to maker without providing est_forward=1? >> >> Thanks, >> Mikael >> >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >> There is a way. It?s not a standard option and it?s undocumented, but >> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >> that. The option won?t already be there so you?ll have to type it in. >> >> There is also a feature designed to work with this option. If you add >> tags to your fasta headers, those can be used to guide the mapping and >> naming. For example, gene_id= will ensure different isoforms >> that share a common gene_id get clustered into the same gene, >> and maker_coor=chr1:1-10000 in the fasta header will force a particular >> sequence to only be mapped against chr1 within the range of 1-10000 bp and >> just using maker_coor=chr1 will force it to only be mapped against chr1. >> >> This is an undocumented way to remap genes onto new assemblies using >> blast alignments of earlier transcript or protein annotations as a guide. >> >> ?Carson >> >> >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, February 25, 2014 at 5:06 PM >> To: >> Subject: [maker-devel] Mapping gene names >> >> Hi, >> >> I?m annotating a genome using a closely related genome from Genbank, >> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >> annotate my genome. I?ve run Maker, and the annotation seems to have worked >> well. Is it possible to map the names of the genes from the related species >> to my annotation? I see the *map_forward* option, which applies to the >> *model_gff* parameter. Is there a similar option for *est* and *protein*? >> >> *maker_opts.ctl* >> >> est=NC_123456.frn >> protein=NC_123456.faa >> est2genome=1 >> protein2genome=1 >> >> Thanks, >> Shaun >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 4 20:33:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 04 Mar 2014 19:33:12 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. It focuses only on the coding genes. You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: > Set single_exon=1, and the minimum size to a smaller value. I think it's set > to 250 right now. Also est2genome is looking for ORF, so if there is none (as > with tRNAs) they probably won't get picked up. > > --Carson > > Sent from my iPhone > > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > >> Sorry, ignore my previous question. est_forward also carries forward the >> names of protein evidence and works like a charm. Thank you! >> >> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 >> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the >> blastn output, and in the evidence_0.gff. rrn5 has perfect identity, >> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < >> eval_blastn=1e-10). How should I debug which filter is removing these hits? >> organism_type=prokaryotic >> est2genome=1 >> protein2genome=1 >> est_forward=1 >> Cheers, >> Shaun >> >> >> >> On 27 February 2014 15:17, Shaun Jackman wrote: >>> Is there a corresponding protein_forward=1 option to map forward protein >>> names from protein2genome? >>> >>> >>> Cheers, >>> Shaun >>> >>> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>> ) wrote: >>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>>> What you can do is run it once with just est_forward=1 and >>>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>>> as model_gff and use the map_forward option to then filter the results >>>>> based on mRNA score and that would copy names onto new gene under the >>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>> the tool will just be calling MAKER with certain parameters restricted). >>>>> I do this because if people commonly use it mixed with things like SNAP I >>>>> can start to get some very weird behaviors. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> It seems that this could be a very useful option in those cases where you >>>>> have firm a priori knowledge of the placement of ESTs. However, while >>>>> trying it I note that est_forward implies that the est2genome predictor is >>>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>>> behavior you describe below where exonerate is made to try really hard >>>>> within a limited region to align an est, but I would not like maker to >>>>> produce est2genome predictions. >>>>> >>>>> In general, I think this maker_coor and est_forward is a feature set that >>>>> is worthy to be promoted into a documented feature. >>>>> >>>>> THanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>>> It will still work without est_forward. It just works a little >>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>> >>>>>> If est_forward is provided, MAKER will parse the database to look for the >>>>>> maker_coor tags early in the pipeline. Then it will create a list of >>>>>> locations to search, and it will search them even if there are no BLAST >>>>>> results to seed the search (normally MAKER gets a BLAST result first and >>>>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>>>> look for a match using all of chr1 as the input to exonerate even when >>>>>> BLAST finds nothing (this is a very very slow search, but can help pick >>>>>> up one or two stubborn genes that don?t remap well). To allow this, >>>>>> MAKER gives exonerate looser matching parameters (i.e. allows for single >>>>>> base pair introns perhaps caused by assembly errors). The logic here is >>>>>> that given the fact that I already told MAKER that with some degree of >>>>>> confidence I expect sequence A to map to to location X, it will try its >>>>>> hardest to make it match. >>>>>> >>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at >>>>>> line 1563, but only after a BLAST alignment has already seeded it to the >>>>>> region (that BLAST result has the information in its description >>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>> exactly. Also match parameters for exonerate will not be relaxed as they >>>>>> were with est_forward. >>>>>> >>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>> accidental feature). >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> >>>>>> From: Mikael Brandstr?m Durling >>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>> To: Carson Holt >>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>> >>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>> well as the configuration option est_forward for this to work. Any >>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>> set_forward=1 right? >>>>>> >>>>>> Mikael >>>>>> >>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> > >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.bemm at uni-wuerzburg.de Wed Mar 5 10:35:33 2014 From: felix.bemm at uni-wuerzburg.de (Felix Bemm) Date: Wed, 05 Mar 2014 17:35:33 +0100 Subject: [maker-devel] Build Issues - v2.31 Message-ID: <53175255.4050102@uni-wuerzburg.de> Hi, I am trying to build maker version 2.31. Got the following error: Configuring MAKER with MPI support 'CCFLAGSEX' is not a valid config option for Inline::C at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm line 236 at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm line 256 Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', '/software/mpich2-1.5rc3/include', 'blib', '') called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277 MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2024 Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007 Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469 MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2024 Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012 Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at ./Build line 70 Same procedure worked with 2.29-beta! Any ideas? Felix -- Felix Bemm Department of Bioinformatics University of W?rzburg, Germany Tel: +49 931 - 31 83696 Fax: +49 931 - 31 84552 felix.bemm at uni-wuerzburg.de From carsonhh at gmail.com Wed Mar 5 10:40:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Mar 2014 09:40:05 -0700 Subject: [maker-devel] Build Issues - v2.31 In-Reply-To: <53175255.4050102@uni-wuerzburg.de> References: <53175255.4050102@uni-wuerzburg.de> Message-ID: You need to update your Inline::C module. The CCFLAGSEX option was added to Inline::C a couple of years ago to allow users to pass in flags to the compiler. Thanks, Carson On 3/5/14, 9:35 AM, "Felix Bemm" wrote: >Hi, > >I am trying to build maker version 2.31. Got the following error: > >Configuring MAKER with MPI support >'CCFLAGSEX' is not a valid config option for Inline::C > at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm >line 236 > at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm >line 256 > Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', >'/software/mpich2-1.5rc3/include', 'blib', '') called at >/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277 > MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at >/usr/share/perl/5.14/Module/Build/Base.pm line 2024 > Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', >'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007 > Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') >called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469 > MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at >/usr/share/perl/5.14/Module/Build/Base.pm line 2024 > Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', >'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012 > Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at >./Build line 70 > >Same procedure worked with 2.29-beta! > >Any ideas? > >Felix > >-- >Felix Bemm >Department of Bioinformatics >University of W?rzburg, Germany >Tel: +49 931 - 31 83696 >Fax: +49 931 - 31 84552 >felix.bemm at uni-wuerzburg.de > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Wed Mar 5 13:02:26 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 5 Mar 2014 19:02:26 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: Message-ID: On 3/5/14, 11:59 AM, "Borhan, Hossein" wrote: >Dear Maker users > >I want to run maker on a fungal genome of about 45 Mb with about 1/3 of >the genome begin repeat rich. But most of the virulent genes are located >within the repeat regions flanked but stretch of repeats. I am not sure >if I use the repeat masker option I am going to miss out on the >predication of these virulent genes located within the repeats. > >Other concerns with the setting in maker-opts file for fungal genomes are: > >single_exon = 0 should this get changed to 1 since single exon genes >are quit common in fungi and what is the consequence of this on using EST >and assembled RNA as evidence for gene prediction > >correct_est_fusion=0 #limits use of ESTs in annotation >to avoid fusion genes as I understand this option will remove the >overlapping UTRs but what is the consequence of setting this option on >the use of EST for predicting ORFs > > >Thanks > > > >HB > > > > From carsonhh at gmail.com Wed Mar 5 13:17:57 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Mar 2014 12:17:57 -0700 Subject: [maker-devel] FW: maker-control file Message-ID: Not using repeat masking will cause many problems. Beside a gene being flanked by repeats does not mean it will be lost, any evidence/alignments that can seed in non-repetative regions (gene/exon) are still allowed to extend into repetitive regions during the polishing stage (aligners have two stages - seed and extend). So transposons should never seed, but genes will because there sequence will contain non-repetative regions (even if they are near repeats). single_exon should be set to 1 for fungi, just make sure to set the minimum length of single exon evidence to something reasonable like 250bp. correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). Thanks, Carson > > >On 3/5/14, 11:59 AM, "Borhan, Hossein" <> wrote: > >>Dear Maker users >> >>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of >>the genome begin repeat rich. But most of the virulent genes are located >>within the repeat regions flanked but stretch of repeats. I am not sure >>if I use the repeat masker option I am going to miss out on the >>predication of these virulent genes located within the repeats. >> >>Other concerns with the setting in maker-opts file for fungal genomes >>are: >> >>single_exon = 0 should this get changed to 1 since single exon genes >>are quit common in fungi and what is the consequence of this on using EST >>and assembled RNA as evidence for gene prediction >> >>correct_est_fusion=0 #limits use of ESTs in annotation >>to avoid fusion genes as I understand this option will remove the >>overlapping UTRs but what is the consequence of setting this option on >>the use of EST for predicting ORFs >> >> >>Thanks >> >> >> >>HB >> >> >> >> > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Thu Mar 6 01:26:29 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Thu, 6 Mar 2014 07:26:29 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: Message-ID: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Hi, I think this is an interesting comment that I would like a few more information on: correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 08:29:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 07:29:35 -0700 Subject: [maker-devel] FW: maker-control file In-Reply-To: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Message-ID: > Wouldn?t it be more sensible to rely on the evidence over probabilistic > models? Yes. Infact that is the backbone of MAKER. The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic). Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER. With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot). > The annotation would be partial, but on the other hand the chance of > incorporating false signals are smaller (assuming I can generate a clean set > of transcripts from RNA-seq data)? False signals are abundant. It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly. By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated. > As an example, using SNAP and Augustus on a bird genome - with augustus > achieving nucleotide and exon sensitivities in the 70-90% range gave a host if > false exons that were simply not supported by the RNAseq data, yet made it > into the final gene build. You will get false positives from est2genome alone approach as well. Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate). Also look at the MAKER2 paper Figure 1. The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors). > Is it possible to get some more details on how Maker uses ab-inito predictions > and reconciles them with evidence alignments? At the moment it seems to me > that maker gives higher weight to the ab-initio predictions, which to me seems > problematic. Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen based off of evidence overlap using AED (completely evidence based). It is the model generation that leverages the hint based feedback. The names of MAKER genes can let you know what the source of the model is. Any time hint based models match the evidence better the name will have hame like this ?> maker---gene- (I.e. maker-chr1-snap-gene-0.4) When the ab initio model matches better than the hint based model the name is like this ?> --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models. Thanks, Carson From: Marc H?ppner Date: Thursday, March 6, 2014 at 12:26 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] FW: maker-control file Hi, I think this is an interesting comment that I would like a few more information on: > > correct_est_fusion should not be used together with est2genome. It won?t > fail, you just get odd results. Actually est2genome should not ever be > used to generate the final annotation set. It is a convenience method > that allows you to generate rough models for training gene predictors like > SNAP and Augustus. But once they are trained it should be turned off, > because the models it produces will be partial (Ests rarely cover the > whole transcript) and the results will have many false potties from > background transcription events from your EST data. These models are good > enough to train with, but make very poor final annotations. So in the end > you should be using correct_est_fusion=1 with the SNAP pr Augustus set and > not est2genome (which should already have been turned off by then). > My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Thu Mar 6 08:40:48 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Thu, 6 Mar 2014 14:40:48 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Message-ID: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> Hi Carson, Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho). But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc. Regards, Marc On 06 Mar 2014, at 15:29, Carson Holt > wrote: Wouldn?t it be more sensible to rely on the evidence over probabilistic models? Yes. Infact that is the backbone of MAKER. The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic). Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER. With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot). The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? False signals are abundant. It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly. By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. You will get false positives from est2genome alone approach as well. Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate). Also look at the MAKER2 paper Figure 1. The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors). Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen based off of evidence overlap using AED (completely evidence based). It is the model generation that leverages the hint based feedback. The names of MAKER genes can let you know what the source of the model is. Any time hint based models match the evidence better the name will have hame like this ?> maker---gene- (I.e. maker-chr1-snap-gene-0.4) When the ab initio model matches better than the hint based model the name is like this ?> --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models. Thanks, Carson From: Marc H?ppner > Date: Thursday, March 6, 2014 at 12:26 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] FW: maker-control file Hi, I think this is an interesting comment that I would like a few more information on: correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 09:03:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 08:03:10 -0700 Subject: [maker-devel] FW: maker-control file In-Reply-To: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> Message-ID: MAKER wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page Thanks, Carson From: Marc H?ppner Date: Thursday, March 6, 2014 at 7:40 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] FW: maker-control file Hi Carson, Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho). But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc. Regards, Marc On 06 Mar 2014, at 15:29, Carson Holt wrote: >> Wouldn?t it be more sensible to rely on the evidence over probabilistic >> models? > > Yes. Infact that is the backbone of MAKER. The evidence is used to derive > hints that are passed back into the predictors and reviewed in light of the > evidence to decide on final models (no longer strictly probabalistic). Take a > look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when > you use the wrong species parameters in the predictor (I.e. A. thaliana to > annotate C. elegant) you get as much as a 3 fold increase in exon level > accuracy by using the hint feedback from MAKER. With est2genome option you > don?t get that hint feedback (normally probabilistic models, EST evidence, and > protein evidence would all work together), and the models are overall poorer > and contain more false positives (we have looked at this a lot). > > >> The annotation would be partial, but on the other hand the chance of >> incorporating false signals are smaller (assuming I can generate a clean set >> of transcripts from RNA-seq data)? > > False signals are abundant. It?s just the nature of how ESTs and especially > mRNAseq reads are generated and anchored back to the assembly. By letting > there be feedback between the probabilistic model and the evidence (both > protein and EST/mRNAseq) a lot of this is eliminated. > > >> As an example, using SNAP and Augustus on a bird genome - with augustus >> achieving nucleotide and exon sensitivities in the 70-90% range gave a host >> if false exons that were simply not supported by the RNAseq data, yet made it >> into the final gene build. > > You will get false positives from est2genome alone approach as well. Models > will be more partial, and false negative rate will be very high (often 30-70% > false negative rate). Also look at the MAKER2 paper Figure 1. The false > positive rate from ab initio alone can be quite high, but with the evidence > feedback it is substantially reduced (especially for poorly trained > predictors). > > >> Is it possible to get some more details on how Maker uses ab-inito >> predictions and reconciles them with evidence alignments? At the moment it >> seems to me that maker gives higher weight to the ab-initio predictions, >> which to me seems problematic. > > Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen > based off of evidence overlap using AED (completely evidence based). It is > the model generation that leverages the hint based feedback. The names of > MAKER genes can let you know what the source of the model is. Any time hint > based models match the evidence better the name will have hame like this ?> > maker---gene- (I.e. maker-chr1-snap-gene-0.4) > > When the ab initio model matches better than the hint based model the name is > like this ?> > --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) > > > In summary, using est2genome alone (while good for generating training sets) > undercuts the power of the evidence feedback together with the probabilistic > models. > > > Thanks, > Carson > > From: Marc H?ppner > Date: Thursday, March 6, 2014 at 12:26 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] FW: maker-control file > > Hi, > > I think this is an interesting comment that I would like a few more > information on: > >> >> correct_est_fusion should not be used together with est2genome. It won?t >> fail, you just get odd results. Actually est2genome should not ever be >> used to generate the final annotation set. It is a convenience method >> that allows you to generate rough models for training gene predictors like >> SNAP and Augustus. But once they are trained it should be turned off, >> because the models it produces will be partial (Ests rarely cover the >> whole transcript) and the results will have many false potties from >> background transcription events from your EST data. These models are good >> enough to train with, but make very poor final annotations. So in the end >> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and >> not est2genome (which should already have been turned off by then). >> > > My experience has been that the process of training gene finders, especially > for complex genomes like vertebrates, is a very slow and painful process. And > ultimately, the results are far from accurate, even with a sizeable, manually > curated training set. Wouldn?t it be more sensible to rely on the evidence > over probabilistic models? The annotation would be partial, but on the other > hand the chance of incorporating false signals are smaller (assuming I can > generate a clean set of transcripts from RNA-seq data)? And I?d rather > underestimate the exon inventory slightly than putting out an annotation with > ~ 10% false exon calls. > > As an example, using SNAP and Augustus on a bird genome - with augustus > achieving nucleotide and exon sensitivities in the 70-90% range gave a host if > false exons that were simply not supported by the RNAseq data, yet made it > into the final gene build. Not sure what to think about that to be honest. Is > it possible to get some more details on how Maker uses ab-inito predictions > and reconciles them with evidence alignments? At the moment it seems to me > that maker gives higher weight to the ab-initio predictions, which to me seems > problematic. > > > /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Mar 6 14:56:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Mar 2014 12:56:34 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson? Sent from my iPhone On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 14:58:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 13:58:41 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Yes. I?ll fix the naming. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 1:56 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: > Trying to call non-coding RNA from ESTs or even sequence homology is extremely > messy (non-trivial problem in most organisms with high false positive rate), > so MAKER for the most part doesn?t even try to do that. It focuses only on > the coding genes. You can now use tRNAscan and snoscan in the newest version > for some non-coding RNA support (those features were only added a couple of > months ago). So just like other prediction tools (snap, augustus etc.), the > primary focus has always been the coding genes. We?ve only started adding > non-coding RNA support recently for iPlant, so it?s still relatively immature. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, March 4, 2014 at 7:10 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the > tip. > > The rRNA genes that are found with est2genome have the feature type set to > mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. > Ideally the feature type would be set to rRNA or tRNA as appropriate, and > would omit the UTR and CDS features. Is that a feature that you would be > interested in adding to MAKER? The rRNA gene names all start with ?rrn? and > the tRNA gene names with ?trn?, as is standard, so determining the appropriate > type should be straight forward. > > Thanks again for your help with this. Cheers, > Shaun > > > > On 27 February 2014 17:13, Carson Holt wrote: >> Set single_exon=1, and the minimum size to a smaller value. I think it's set >> to 250 right now. Also est2genome is looking for ORF, so if there is none >> (as with tRNAs) they probably won't get picked up. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 >>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in >>> the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, >>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < >>> eval_blastn=1e-10). How should I debug which filter is removing these hits? >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> Cheers, >>> Shaun >>> >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>> Is there a corresponding protein_forward=1 option to map forward protein >>>> names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>> ) wrote: >>>>> >>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>> passing the gff3 to model_gff. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>> >>>>>> What you can do is run it once with just est_forward=1 and >>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>> in as model_gff and use the map_forward option to then filter the results >>>>>> based on mRNA score and that would copy names onto new gene under the >>>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>>> the tool will just be calling MAKER with certain parameters restricted). >>>>>> I do this because if people commonly use it mixed with things like SNAP I >>>>>> can start to get some very weird behaviors. >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> From: Mikael Brandstr?m Durling >>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>> To: Carson Holt >>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>> >>>>>> It seems that this could be a very useful option in those cases where you >>>>>> have firm a priori knowledge of the placement of ESTs. However, while >>>>>> trying it I note that est_forward implies that the est2genome predictor >>>>>> is turned on, implicitly. Is this necessary for this to work? I?m after >>>>>> the behavior you describe below where exonerate is made to try really >>>>>> hard within a limited region to align an est, but I would not like maker >>>>>> to produce est2genome predictions. >>>>>> >>>>>> In general, I think this maker_coor and est_forward is a feature set that >>>>>> is worthy to be promoted into a documented feature. >>>>>> >>>>>> THanks, >>>>>> Mikael >>>>>> >>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Mar 6 17:00:40 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 6 Mar 2014 23:00:40 +0000 Subject: [maker-devel] maker problem with running blast In-Reply-To: References: Message-ID: Your blast_type parameter in maker_bopts.ctl is set to 'wublast' but the executables for wublast are blank in maker_exe.ctl. See, they?re blank ?> xdformat=#location of WUBLAST xdformat executable blasta=#location of WUBLAST blasta executable You either need to provide executables or set your blast_type parameter to something else. For example, you could set it to 'NCBI+', but you will nee to fix the location of makeblastdb. makeblastdb is set incorrectly here?> makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of NCBI+ makeblastdb executable Alternativley you can set blast_type to 'NCBI', but you will need to uncomment the executables. Here?> formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable blastall=#/usr/local/bin/blastall #location of NCBI blastall executable ?Carson On 3/6/14, 3:51 PM, "Borhan, Hossein" wrote: >Hi > >I have installed latest version of blast+ and provided the excitable path >to the maker_exec.ctl as follow > >#-----Location of Executables Used by MAKER/EVALUATOR >makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of >NCBI+ makeblastdb executable >blastn=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastn #location >of NCBI+ blastn executable >blastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastx #location >of NCBI+ blastx executable >tblastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/tblastx >#location of NCBI+ tblastx executable >formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable >blastall=#/usr/local/bin/blastall #location of NCBI blastall executable >xdformat=#location of WUBLAST xdformat executable >blasta=#location of WUBLAST blasta executable >RepeatMasker=/usr/local/RepeatMasker/RepeatMasker #location of >RepeatMasker executable >exonerate=/home/AAFC-AAC/borhanh/bin/exonerate-2.2.0-x86_64/bin/exonerate >#location of exonerate executable > >#-----Ab-initio Gene Prediction Algorithms >snap=/home/AAFC-AAC/borhanh/bin/snap/snap #location of snap executable >gmhmme3=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/gmhmme3 >#location of eukaryotic genemark executable >gmhmmp= #location of prokaryotic genemark executable >augustus=/usr/local/augustus.2.5.5/bin/augustus #location of augustus >executable >fgenesh=/usr/local/FGENESH/fgenesh #location of fgenesh executable > >#-----Other Algorithms >fathom=/home/AAFC-AAC/borhanh/bin/snap/fathom #location of fathom >executable (experimental) >probuild=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/probuild >#location of probuild executable (required for genemark) > > > > > >But when running maker I get this error > > >STATUS: Parsing control files... >WARNING: blast_type is set to 'wublast' but executables cannot be located >ERROR: Please provide a valid locaction for a BLAST algorithm in the >control files. > > > > > > > From sjackman at gmail.com Thu Mar 6 17:33:04 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Mar 2014 15:33:04 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models? Cheers, Shaun On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote: Yes. ?I?ll fix the naming. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 1:56 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson? Sent from my iPhone On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 17:38:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 16:38:48 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Well? not really. I have no plans to add est2genome support for noncoding genes (non-trivial), so you would either have to remove the ncRNA from your input, or filter it out downstream. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 4:33 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models? Cheers, Shaun On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote: > Yes. I?ll fix the naming. > > Thanks, > Carson > > > From: Shaun Jackman > Date: Thursday, March 6, 2014 at 1:56 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I agree that identifying non-coding RNA by homology in general is > a non-trivial problem. In my particular case, I have a well annotated > reference species that is very closely related (99.2% sequence identity), so > lifting over the annotations from that reference species to my species should > be pretty straight forward. It would be great if MAKER had an option for RNA > sequence homology similar to est2genome that does not imply the sequence is > coding. > > The integration of MAKER-P with tRNAscan is very useful. The identified genes > are named e.g. `trnascan-205522-processed-gene-0.38`. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names > with that convention? > > Cheers, > Shaun > > > On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: >> >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were only >> added a couple of months ago). So just like other prediction tools (snap, >> augustus etc.), the primary focus has always been the coding genes. We?ve >> only started adding non-coding RNA support recently for iPlant, so it?s still >> relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for >> the tip. >> >> The rRNA genes that are found with est2genome have the feature type set to >> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. >> Ideally the feature type would be set to rRNA or tRNA as appropriate, and >> would omit the UTR and CDS features. Is that a feature that you would be >> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and >> the tRNA gene names with ?trn?, as is standard, so determining the >> appropriate type should be straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >>> Set single_exon=1, and the minimum size to a smaller value. I think it's >>> set to 250 right now. Also est2genome is looking for ORF, so if there is >>> none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>>> Sorry, ignore my previous question. est_forward also carries forward the >>>> names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>> these hits? >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> Cheers, >>>> Shaun >>>> >>>> >>>> >>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>> Is there a corresponding protein_forward=1 option to map forward protein >>>>> names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>> >>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>>> ) wrote: >>>>>> >>>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>>> passing the gff3 to model_gff. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and >>>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>>> in as model_gff and use the map_forward option to then filter the >>>>>>> results based on mRNA score and that would copy names onto new gene >>>>>>> under the standard MAKER pipeline. Eventually it?s really supposed to >>>>>>> go into a separate tool that will map genes onto new assemblies (but >>>>>>> under the hood the tool will just be calling MAKER with certain >>>>>>> parameters restricted). I do this because if people commonly use it >>>>>>> mixed with things like SNAP I can start to get some very weird >>>>>>> behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where >>>>>>> you have firm a priori knowledge of the placement of ESTs. However, >>>>>>> while trying it I note that est_forward implies that the est2genome >>>>>>> predictor is turned on, implicitly. Is this necessary for this to work? >>>>>>> I?m after the behavior you describe below where exonerate is made to try >>>>>>> really hard within a limited region to align an est, but I would not >>>>>>> like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set >>>>>>> that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Thu Mar 6 17:41:55 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Thu, 6 Mar 2014 23:41:55 +0000 Subject: [maker-devel] Long introns from Augustus Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08236@EXCHANGE-MB01.internal.solazyme.com> Hi, we have a very compact genome and we are getting a lot of fused gene models from running Augustus. I am wondering if anyone has any advice about how to prevent introns above a certain cutoff from being created? I tried a couple of things, some settings in a probabilities file and also changing a long list of probabilities to another file that someone had suggested on a forum. So far I don't really see any changes though. Any advice would be greatly appreciated. Thanks, Shane From carsonhh at gmail.com Thu Mar 6 17:46:53 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 16:46:53 -0700 Subject: [maker-devel] Long introns from Augustus Message-ID: Are these the ab intio calls that are merged or final MAKER models. ?Carson On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >Hi, we have a very compact genome and we are getting a lot of fused gene >models from running Augustus. I am wondering if anyone has any advice >about how to prevent introns above a certain cutoff from being created? > >I tried a couple of things, some settings in a probabilities file and >also changing a long list of probabilities to another file that someone >had suggested on a forum. So far I don't really see any changes though. > >Any advice would be greatly appreciated. > >Thanks, >Shane > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sbrubaker at solazyme.com Thu Mar 6 18:48:15 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 7 Mar 2014 00:48:15 +0000 Subject: [maker-devel] Long introns from Augustus In-Reply-To: References: Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, March 06, 2014 3:47 PM To: Shane Brubaker; maker-devel at yandell-lab.org Subject: Re: [maker-devel] Long introns from Augustus Are these the ab intio calls that are merged or final MAKER models. ?Carson On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >Hi, we have a very compact genome and we are getting a lot of fused >gene models from running Augustus. I am wondering if anyone has any >advice about how to prevent introns above a certain cutoff from being created? > >I tried a couple of things, some settings in a probabilities file and >also changing a long list of probabilities to another file that someone >had suggested on a forum. So far I don't really see any changes though. > >Any advice would be greatly appreciated. > >Thanks, >Shane > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Mon Mar 10 05:27:25 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 10:27:25 +0000 Subject: [maker-devel] keep_preds values Message-ID: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> Hi, Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either. Thanks, Mikael From robert.king at rothamsted.ac.uk Mon Mar 10 07:17:07 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Mon, 10 Mar 2014 12:17:07 +0000 Subject: [maker-devel] annotation comparison aed plots Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Dear Maker Developers, I've updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I've produced two annotations using as evidence: Annotation 1: Uniprot proteins search using species keyword "fusarium" Pubmed mRNA for the name of the organism Prior annotation reference transcripts Annotation 2: Uniprot proteins search using species keyword "fusarium" Pubmed mRNA for the name of the organism Prior annotation reference transcripts mRNA trinity assembly pasafly of different strain (only RNA-seq available) I'm not sure if it was a smart move to use the prior annotation reference transcripts? I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I've got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I'm waiting on some perl modules to be installed by our administrator to test out the "Evaluator" and "compare" programs too, what do they do? Best Wishes Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Mar 10 09:47:42 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 10 Mar 2014 14:47:42 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> References: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> Message-ID: Hi Mikael, The keep_preds parameter is often used the same as a binary parameter, but it doesn't have to be. The concordance that is mentioned in the comment line is the AED for that prediction. AED is a measurement of how well a prediction is supported by the evidence and ranges from 0 - 1. A prediction with an AED of 0 matches the evidence exactly while a prediction with an AED of 1 isn't overlapped by any evidence. The default behavior for MAKER is to make a gene model out of a prediction with any AED <1. When you change the keep_preds option from 0 to 1, then MAKER will make a gene model out of any prediction that matches the other parameters (like single_exon, min_exon, etc). Setting the keep_preds option to somewhere in between 0 and 1 will set a ceiling on the AED required for promoting a prediction to a gene model. >From a user standpoint, when you will almost certainly lose gene models when you set AED at an intermediate value, but you might benefit by knowing that all your models will now have an AED of at least a certain value. I hope that helps; let me know if it didn't. ~Daniel PS The original paper that described the AED is Eilbeck et al in BMC Bioinformatics 2009. It's also discussed in more detail in the MAKER2 paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews Genetics paper from 2012. Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandstr?m Durling [mikael.durling at slu.se] Sent: Monday, March 10, 2014 4:27 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] keep_preds values Hi, Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either. Thanks, Mikael _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 10 10:51:21 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 08:51:21 -0700 Subject: [maker-devel] keep_preds values Message-ID: Actually that is false. The keep_preds option is still binary. Any value other than 0 sets it to true. There was discussion about making it a non-binary value, but that has not been implemented. ?Carson On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >Hi Mikael, > >The keep_preds parameter is often used the same as a binary parameter, >but it doesn't have to be. The concordance that is mentioned in the >comment line is the AED for that prediction. AED is a measurement of how >well a prediction is supported by the evidence and ranges from 0 - 1. A >prediction with an AED of 0 matches the evidence exactly while a >prediction with an AED of 1 isn't overlapped by any evidence. > >The default behavior for MAKER is to make a gene model out of a >prediction with any AED <1. When you change the keep_preds option from 0 >to 1, then MAKER will make a gene model out of any prediction that >matches the other parameters (like single_exon, min_exon, etc). Setting >the keep_preds option to somewhere in between 0 and 1 will set a ceiling >on the AED required for promoting a prediction to a gene model. > >From a user standpoint, when you will almost certainly lose gene models >when you set AED at an intermediate value, but you might benefit by >knowing that all your models will now have an AED of at least a certain >value. > >I hope that helps; let me know if it didn't. > >~Daniel > >PS The original paper that described the AED is Eilbeck et al in BMC >Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >Genetics paper from 2012. > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Mikael Brandstr?m Durling [mikael.durling at slu.se] >Sent: Monday, March 10, 2014 4:27 AM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] keep_preds values > >Hi, > >Can someone, please, explain the keep_preds parameter, as it works now >with a value between 1 and 0? It used to be binary, but now it seems to >test concordance towards something. The maker wiki doesn?t explain it any >further either. > >Thanks, >Mikael > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Mon Mar 10 09:57:23 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 14:57:23 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: Hi Carson and Daniel, That sounds more logical to me. Then it would be appropriate to change the comment of keep_preds in the generated config files. Would it make sense to make keep_preds a non-binary value to evaluate the concordance between ab initio models obtained from different predictors? That would assume that it is less likely to be a false positive when two or more predictors suggest the same unsported model? Mikael 10 mar 2014 kl. 16:51 skrev Carson Holt : > Actually that is false. The keep_preds option is still binary. Any value > other than 0 sets it to true. There was discussion about making it a > non-binary value, but that has not been implemented. > > ?Carson > > > On 3/10/14, 7:47 AM, "Daniel Ence" wrote: > >> Hi Mikael, >> >> The keep_preds parameter is often used the same as a binary parameter, >> but it doesn't have to be. The concordance that is mentioned in the >> comment line is the AED for that prediction. AED is a measurement of how >> well a prediction is supported by the evidence and ranges from 0 - 1. A >> prediction with an AED of 0 matches the evidence exactly while a >> prediction with an AED of 1 isn't overlapped by any evidence. >> >> The default behavior for MAKER is to make a gene model out of a >> prediction with any AED <1. When you change the keep_preds option from 0 >> to 1, then MAKER will make a gene model out of any prediction that >> matches the other parameters (like single_exon, min_exon, etc). Setting >> the keep_preds option to somewhere in between 0 and 1 will set a ceiling >> on the AED required for promoting a prediction to a gene model. >> >> From a user standpoint, when you will almost certainly lose gene models >> when you set AED at an intermediate value, but you might benefit by >> knowing that all your models will now have an AED of at least a certain >> value. >> >> I hope that helps; let me know if it didn't. >> >> ~Daniel >> >> PS The original paper that described the AED is Eilbeck et al in BMC >> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >> Genetics paper from 2012. >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >> Mikael Brandstr?m Durling [mikael.durling at slu.se] >> Sent: Monday, March 10, 2014 4:27 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] keep_preds values >> >> Hi, >> >> Can someone, please, explain the keep_preds parameter, as it works now >> with a value between 1 and 0? It used to be binary, but now it seems to >> test concordance towards something. The maker wiki doesn?t explain it any >> further either. >> >> Thanks, >> Mikael >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Mon Mar 10 10:59:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 08:59:43 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: Yes. It will eventually perform an AED like calculation between multiple predictors (i.e. if you use 3 predictors it, then you require support by at least 2 predictors across all exons to get a value of 0.33). A value of 0 would be perfect concordance across all 3 predictors. ?Carson On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" wrote: >Hi Carson and Daniel, > >That sounds more logical to me. Then it would be appropriate to change >the comment of keep_preds in the generated config files. > >Would it make sense to make keep_preds a non-binary value to evaluate the >concordance between ab initio models obtained from different predictors? >That would assume that it is less likely to be a false positive when two >or more predictors suggest the same unsported model? > >Mikael > > >10 mar 2014 kl. 16:51 skrev Carson Holt : > >> Actually that is false. The keep_preds option is still binary. Any >>value >> other than 0 sets it to true. There was discussion about making it a >> non-binary value, but that has not been implemented. >> >> ?Carson >> >> >> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >> >>> Hi Mikael, >>> >>> The keep_preds parameter is often used the same as a binary parameter, >>> but it doesn't have to be. The concordance that is mentioned in the >>> comment line is the AED for that prediction. AED is a measurement of >>>how >>> well a prediction is supported by the evidence and ranges from 0 - 1. A >>> prediction with an AED of 0 matches the evidence exactly while a >>> prediction with an AED of 1 isn't overlapped by any evidence. >>> >>> The default behavior for MAKER is to make a gene model out of a >>> prediction with any AED <1. When you change the keep_preds option from >>>0 >>> to 1, then MAKER will make a gene model out of any prediction that >>> matches the other parameters (like single_exon, min_exon, etc). Setting >>> the keep_preds option to somewhere in between 0 and 1 will set a >>>ceiling >>> on the AED required for promoting a prediction to a gene model. >>> >>> From a user standpoint, when you will almost certainly lose gene models >>> when you set AED at an intermediate value, but you might benefit by >>> knowing that all your models will now have an AED of at least a certain >>> value. >>> >>> I hope that helps; let me know if it didn't. >>> >>> ~Daniel >>> >>> PS The original paper that described the AED is Eilbeck et al in BMC >>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>> Genetics paper from 2012. >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>> Sent: Monday, March 10, 2014 4:27 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] keep_preds values >>> >>> Hi, >>> >>> Can someone, please, explain the keep_preds parameter, as it works now >>> with a value between 1 and 0? It used to be binary, but now it seems to >>> test concordance towards something. The maker wiki doesn?t explain it >>>any >>> further either. >>> >>> Thanks, >>> Mikael >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From mikael.durling at slu.se Mon Mar 10 10:08:16 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 15:08:16 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Ok. But that is not implemented no as far as I can tell from the source, right? Or is it reflected in the AED for the unsupported models? Mikael 10 mar 2014 kl. 16:59 skrev Carson Holt : > Yes. It will eventually perform an AED like calculation between multiple > predictors (i.e. if you use 3 predictors it, then you require support by > at least 2 predictors across all exons to get a value of 0.33). A value > of 0 would be perfect concordance across all 3 predictors. > > ?Carson > > > > > On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" > wrote: > >> Hi Carson and Daniel, >> >> That sounds more logical to me. Then it would be appropriate to change >> the comment of keep_preds in the generated config files. >> >> Would it make sense to make keep_preds a non-binary value to evaluate the >> concordance between ab initio models obtained from different predictors? >> That would assume that it is less likely to be a false positive when two >> or more predictors suggest the same unsported model? >> >> Mikael >> >> >> 10 mar 2014 kl. 16:51 skrev Carson Holt : >> >>> Actually that is false. The keep_preds option is still binary. Any >>> value >>> other than 0 sets it to true. There was discussion about making it a >>> non-binary value, but that has not been implemented. >>> >>> ?Carson >>> >>> >>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>> >>>> Hi Mikael, >>>> >>>> The keep_preds parameter is often used the same as a binary parameter, >>>> but it doesn't have to be. The concordance that is mentioned in the >>>> comment line is the AED for that prediction. AED is a measurement of >>>> how >>>> well a prediction is supported by the evidence and ranges from 0 - 1. A >>>> prediction with an AED of 0 matches the evidence exactly while a >>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>> >>>> The default behavior for MAKER is to make a gene model out of a >>>> prediction with any AED <1. When you change the keep_preds option from >>>> 0 >>>> to 1, then MAKER will make a gene model out of any prediction that >>>> matches the other parameters (like single_exon, min_exon, etc). Setting >>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>> ceiling >>>> on the AED required for promoting a prediction to a gene model. >>>> >>>> From a user standpoint, when you will almost certainly lose gene models >>>> when you set AED at an intermediate value, but you might benefit by >>>> knowing that all your models will now have an AED of at least a certain >>>> value. >>>> >>>> I hope that helps; let me know if it didn't. >>>> >>>> ~Daniel >>>> >>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>> Genetics paper from 2012. >>>> >>>> Daniel Ence >>>> Graduate Student >>>> Eccles Institute of Human Genetics >>>> University of Utah >>>> 15 North 2030 East, Room 2100 >>>> Salt Lake City, UT 84112-5330 >>>> ________________________________________ >>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>> Sent: Monday, March 10, 2014 4:27 AM >>>> To: maker-devel at yandell-lab.org >>>> Subject: [maker-devel] keep_preds values >>>> >>>> Hi, >>>> >>>> Can someone, please, explain the keep_preds parameter, as it works now >>>> with a value between 1 and 0? It used to be binary, but now it seems to >>>> test concordance towards something. The maker wiki doesn?t explain it >>>> any >>>> further either. >>>> >>>> Thanks, >>>> Mikael >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > From carsonhh at gmail.com Mon Mar 10 11:16:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:16:59 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> References: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Message-ID: There is a value called abAED being calculated, which somewhat captures the concordance among the predictors. It is not currently printed in the GFF3, but it is used to identify the best non-overlapping ab initio predictor to put in the non-overlapping fasta file. There are a couple of things I still need to do with it to though. It?s not yet normalized to take into account the absence of a predictor in the cluster of overlapping predictions. For example, if I have 2 predictors and 2 make perfectly matching calls and 1 makes no call, they get a score of 0 before I have perfect concordance between what?s there, but I really should make it 0.33 because the abscence of the third predictor is meaningful. The unnormalized concordance value is fine for deciding which overlapping model to keep in the file, but not for global comparison. ?Carson On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" wrote: >Ok. But that is not implemented no as far as I can tell from the source, >right? Or is it reflected in the AED for the unsupported models? > >Mikael > >10 mar 2014 kl. 16:59 skrev Carson Holt : > >> Yes. It will eventually perform an AED like calculation between >>multiple >> predictors (i.e. if you use 3 predictors it, then you require support by >> at least 2 predictors across all exons to get a value of 0.33). A value >> of 0 would be perfect concordance across all 3 predictors. >> >> ?Carson >> >> >> >> >> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" >> wrote: >> >>> Hi Carson and Daniel, >>> >>> That sounds more logical to me. Then it would be appropriate to change >>> the comment of keep_preds in the generated config files. >>> >>> Would it make sense to make keep_preds a non-binary value to evaluate >>>the >>> concordance between ab initio models obtained from different >>>predictors? >>> That would assume that it is less likely to be a false positive when >>>two >>> or more predictors suggest the same unsported model? >>> >>> Mikael >>> >>> >>> 10 mar 2014 kl. 16:51 skrev Carson Holt : >>> >>>> Actually that is false. The keep_preds option is still binary. Any >>>> value >>>> other than 0 sets it to true. There was discussion about making it a >>>> non-binary value, but that has not been implemented. >>>> >>>> ?Carson >>>> >>>> >>>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>>> >>>>> Hi Mikael, >>>>> >>>>> The keep_preds parameter is often used the same as a binary >>>>>parameter, >>>>> but it doesn't have to be. The concordance that is mentioned in the >>>>> comment line is the AED for that prediction. AED is a measurement of >>>>> how >>>>> well a prediction is supported by the evidence and ranges from 0 - >>>>>1. A >>>>> prediction with an AED of 0 matches the evidence exactly while a >>>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>>> >>>>> The default behavior for MAKER is to make a gene model out of a >>>>> prediction with any AED <1. When you change the keep_preds option >>>>>from >>>>> 0 >>>>> to 1, then MAKER will make a gene model out of any prediction that >>>>> matches the other parameters (like single_exon, min_exon, etc). >>>>>Setting >>>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>>> ceiling >>>>> on the AED required for promoting a prediction to a gene model. >>>>> >>>>> From a user standpoint, when you will almost certainly lose gene >>>>>models >>>>> when you set AED at an intermediate value, but you might benefit by >>>>> knowing that all your models will now have an AED of at least a >>>>>certain >>>>> value. >>>>> >>>>> I hope that helps; let me know if it didn't. >>>>> >>>>> ~Daniel >>>>> >>>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>>> Genetics paper from 2012. >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>>> Sent: Monday, March 10, 2014 4:27 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] keep_preds values >>>>> >>>>> Hi, >>>>> >>>>> Can someone, please, explain the keep_preds parameter, as it works >>>>>now >>>>> with a value between 1 and 0? It used to be binary, but now it seems >>>>>to >>>>> test concordance towards something. The maker wiki doesn?t explain it >>>>> any >>>>> further either. >>>>> >>>>> Thanks, >>>>> Mikael >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>> >> >> > From carsonhh at gmail.com Mon Mar 10 11:18:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:18:14 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: References: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Message-ID: Sorry meant to say "3 predictors and 2 make perfectly matching calls and 1 makes no call." On 3/10/14, 9:16 AM, "Carson Holt" wrote: >There is a value called abAED being calculated, which somewhat captures >the concordance among the predictors. It is not currently printed in the >GFF3, but it is used to identify the best non-overlapping ab initio >predictor to put in the non-overlapping fasta file. There are a couple of >things I still need to do with it to though. It?s not yet normalized to >take into account the absence of a predictor in the cluster of overlapping >predictions. For example, if I have 2 predictors and 2 make perfectly >matching calls and 1 makes no call, they get a score of 0 before I have >perfect concordance between what?s there, but I really should make it 0.33 >because the abscence of the third predictor is meaningful. The >unnormalized concordance value is fine for deciding which overlapping >model to keep in the file, but not for global comparison. > >?Carson > > > >On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" >wrote: > >>Ok. But that is not implemented no as far as I can tell from the source, >>right? Or is it reflected in the AED for the unsupported models? >> >>Mikael >> >>10 mar 2014 kl. 16:59 skrev Carson Holt : >> >>> Yes. It will eventually perform an AED like calculation between >>>multiple >>> predictors (i.e. if you use 3 predictors it, then you require support >>>by >>> at least 2 predictors across all exons to get a value of 0.33). A >>>value >>> of 0 would be perfect concordance across all 3 predictors. >>> >>> ?Carson >>> >>> >>> >>> >>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" >>> >>> wrote: >>> >>>> Hi Carson and Daniel, >>>> >>>> That sounds more logical to me. Then it would be appropriate to >>>>change >>>> the comment of keep_preds in the generated config files. >>>> >>>> Would it make sense to make keep_preds a non-binary value to evaluate >>>>the >>>> concordance between ab initio models obtained from different >>>>predictors? >>>> That would assume that it is less likely to be a false positive when >>>>two >>>> or more predictors suggest the same unsported model? >>>> >>>> Mikael >>>> >>>> >>>> 10 mar 2014 kl. 16:51 skrev Carson Holt : >>>> >>>>> Actually that is false. The keep_preds option is still binary. Any >>>>> value >>>>> other than 0 sets it to true. There was discussion about making it a >>>>> non-binary value, but that has not been implemented. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>>>> >>>>>> Hi Mikael, >>>>>> >>>>>> The keep_preds parameter is often used the same as a binary >>>>>>parameter, >>>>>> but it doesn't have to be. The concordance that is mentioned in the >>>>>> comment line is the AED for that prediction. AED is a measurement of >>>>>> how >>>>>> well a prediction is supported by the evidence and ranges from 0 - >>>>>>1. A >>>>>> prediction with an AED of 0 matches the evidence exactly while a >>>>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>>>> >>>>>> The default behavior for MAKER is to make a gene model out of a >>>>>> prediction with any AED <1. When you change the keep_preds option >>>>>>from >>>>>> 0 >>>>>> to 1, then MAKER will make a gene model out of any prediction that >>>>>> matches the other parameters (like single_exon, min_exon, etc). >>>>>>Setting >>>>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>>>> ceiling >>>>>> on the AED required for promoting a prediction to a gene model. >>>>>> >>>>>> From a user standpoint, when you will almost certainly lose gene >>>>>>models >>>>>> when you set AED at an intermediate value, but you might benefit by >>>>>> knowing that all your models will now have an AED of at least a >>>>>>certain >>>>>> value. >>>>>> >>>>>> I hope that helps; let me know if it didn't. >>>>>> >>>>>> ~Daniel >>>>>> >>>>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>>>> Bioinformatics 2009. It's also discussed in more detail in the >>>>>>MAKER2 >>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>>>> Genetics paper from 2012. >>>>>> >>>>>> Daniel Ence >>>>>> Graduate Student >>>>>> Eccles Institute of Human Genetics >>>>>> University of Utah >>>>>> 15 North 2030 East, Room 2100 >>>>>> Salt Lake City, UT 84112-5330 >>>>>> ________________________________________ >>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>>>> Sent: Monday, March 10, 2014 4:27 AM >>>>>> To: maker-devel at yandell-lab.org >>>>>> Subject: [maker-devel] keep_preds values >>>>>> >>>>>> Hi, >>>>>> >>>>>> Can someone, please, explain the keep_preds parameter, as it works >>>>>>now >>>>>> with a value between 1 and 0? It used to be binary, but now it seems >>>>>>to >>>>>> test concordance towards something. The maker wiki doesn?t explain >>>>>>it >>>>>> any >>>>>> further either. >>>>>> >>>>>> Thanks, >>>>>> Mikael >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>> >>>>> >>>> >>> >>> >> > > From carsonhh at gmail.com Mon Mar 10 11:25:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:25:50 -0700 Subject: [maker-devel] annotation comparison aed plots Message-ID: I don?t know about Michaels?s script, but I?ve always used eval. It produces sensitivity/specificity metrics. It assumes the first models are 100% correct, and then tells you the sensitivity/specificity value for the second models. It is not therefor a quality metric. Instead you should view it as a change metric. Lower sensitivity tells you that models/exons have been lost between versions, and lower specificity tells you models/exons have been gained. There will also be a lost of generic statistics on exon/intron distribution and UTR length. Then the AED values from the MAEKR run can be used independently to evaluate how well models match the evidence. ?Carson From: "Robert King (RRes-Roth)" Date: Monday, March 10, 2014 at 5:17 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] annotation comparison aed plots Dear Maker Developers, I?ve updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I?ve produced two annotations using as evidence: Annotation 1: Uniprot proteins search using species keyword ?fusarium? Pubmed mRNA for the name of the organism Prior annotation reference transcripts Annotation 2: Uniprot proteins search using species keyword ?fusarium? Pubmed mRNA for the name of the organism Prior annotation reference transcripts mRNA trinity assembly pasafly of different strain (only RNA-seq available) I?m not sure if it was a smart move to use the prior annotation reference transcripts? I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I?ve got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I?m waiting on some perl modules to be installed by our administrator to test out the ?Evaluator? and ?compare? programs too, what do they do? Best Wishes Rob -- This message has been scanned for viruses and dangerous content by MailScanner , and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Mar 10 10:50:53 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 10 Mar 2014 09:50:53 -0600 Subject: [maker-devel] annotation comparison aed plots In-Reply-To: References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Message-ID: One more point. The sensitivity, specificity,and accuracy produced by the compare_annotations_3.2.pl script are gene level, and overlap is defined very liberally between annotation sets is defined as at least one nucleotide of an exon overlap. Mike On Mon, Mar 10, 2014 at 9:47 AM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Robert, > > Here are the scripts that were mentioned before. > > The AED_cdf_generator.pl script is for making cumulative distribution > function plots based on annotation edit distance. This script is quite > simple and strait forward in its internals. > > The compare_annotations_3.2.pl script is for generating summary stats for > annotations and will compare two annotations of the same assembly. > > You can run either script without arguments to get a usage statement. > > Thanks, > Mike > > > On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) < > robert.king at rothamsted.ac.uk> wrote: > >> Dear Maker Developers, >> >> >> >> I've updated a reference that was had errors and was a little incomplete >> and now trying to produce a annotation for it. Please note the reference >> has not changed dramatically. I've produced two annotations using as >> evidence: >> >> >> >> Annotation 1: >> >> Uniprot proteins search using species keyword "fusarium" >> >> Pubmed mRNA for the name of the organism >> >> Prior annotation reference transcripts >> >> >> >> Annotation 2: >> >> Uniprot proteins search using species keyword "fusarium" >> >> Pubmed mRNA for the name of the organism >> >> Prior annotation reference transcripts >> >> mRNA trinity assembly pasafly of different strain (only RNA-seq available) >> >> >> >> I'm not sure if it was a smart move to use the prior annotation reference >> transcripts? >> >> >> >> I want to compare these two annotations and have produced AED scores. How >> do I generate summary stats/figures to compare annotations. You mentioned >> last year in a post Mike Campbell has a script to produce these, do you >> know if he will post it? I've got the Eval program and converted to gtf >> format using the provided script, just waiting on some perl modules to be >> installed by admin to test it. I'm waiting on some perl modules to be >> installed by our administrator to test out the "Evaluator" and "compare" >> programs too, what do they do? >> >> >> >> Best Wishes >> >> Rob >> >> -- >> This message has been scanned for viruses and >> dangerous content by *MailScanner* , and >> we believe but do not warrant that this e-mail and any attachments >> thereto do not contain any viruses. However, you are fully responsible for >> performing any virus scanning. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Mon Mar 10 10:52:50 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 10 Mar 2014 15:52:50 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) Message-ID: I have been running MAKER 2.31 using Augustus and SNAP on an avian genome. Augustus gives pretty decent gene model predictions based on a custom model we have and the hints MAKER provides. However, SNAP seems to throw out a ton of false positives; in many cases this appears to cause erroneous gene fusions. Leaving out SNAP altogether however leads to a marked decrease in # models overall, which is worse. GeneMark had a very similar problem (high # false positives) and thus no marked improvement, either when using with both Augustus and SNAP or with Augustus alone. I have been exploring using geneid (http://genome.crg.es/software/geneid/) as an alternative, based on some feedback on another project I worked with int he past. This would be feed into MAKER using external GFF, but I wanted to see if anyone has tried geneid with MAKER first. Finally, how hard would it be to incorporate alternative callers into MAKER? For instance, would it be possible to add these like a ?plugin?? chris From carsonhh at gmail.com Mon Mar 10 12:05:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 10:05:24 -0700 Subject: [maker-devel] geneid (or alternative ab initio predictors) Message-ID: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Mon Mar 10 10:47:50 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 10 Mar 2014 09:47:50 -0600 Subject: [maker-devel] annotation comparison aed plots In-Reply-To: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Message-ID: Hi Robert, Here are the scripts that were mentioned before. The AED_cdf_generator.pl script is for making cumulative distribution function plots based on annotation edit distance. This script is quite simple and strait forward in its internals. The compare_annotations_3.2.pl script is for generating summary stats for annotations and will compare two annotations of the same assembly. You can run either script without arguments to get a usage statement. Thanks, Mike On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) < robert.king at rothamsted.ac.uk> wrote: > Dear Maker Developers, > > > > I've updated a reference that was had errors and was a little incomplete > and now trying to produce a annotation for it. Please note the reference > has not changed dramatically. I've produced two annotations using as > evidence: > > > > Annotation 1: > > Uniprot proteins search using species keyword "fusarium" > > Pubmed mRNA for the name of the organism > > Prior annotation reference transcripts > > > > Annotation 2: > > Uniprot proteins search using species keyword "fusarium" > > Pubmed mRNA for the name of the organism > > Prior annotation reference transcripts > > mRNA trinity assembly pasafly of different strain (only RNA-seq available) > > > > I'm not sure if it was a smart move to use the prior annotation reference > transcripts? > > > > I want to compare these two annotations and have produced AED scores. How > do I generate summary stats/figures to compare annotations. You mentioned > last year in a post Mike Campbell has a script to produce these, do you > know if he will post it? I've got the Eval program and converted to gtf > format using the provided script, just waiting on some perl modules to be > installed by admin to test it. I'm waiting on some perl modules to be > installed by our administrator to test out the "Evaluator" and "compare" > programs too, what do they do? > > > > Best Wishes > > Rob > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and > we believe but do not warrant that this e-mail and any attachments thereto > do not contain any viruses. However, you are fully responsible for > performing any virus scanning. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AED_cdf_generator.pl Type: text/x-perl-script Size: 2579 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: compare_annotations_3.2.pl Type: text/x-perl-script Size: 29154 bytes Desc: not available URL: From sajeet at gmail.com Mon Mar 10 13:31:40 2014 From: sajeet at gmail.com (Sajeet Haridas) Date: Mon, 10 Mar 2014 11:31:40 -0700 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt wrote: > Adding a new predictor can take some time. It obviously requires some > coding. It's usually not too hard just to convert results to GFF3 and > then pass it in. Integrated support is really only beneficial for > predictors that can take "hints" from evidence alignments (for example we > are working on EVM integration right now - > http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give > problems just drop them. GeneMark really doesn't work very good on > genomes with complex intron/exon structure (and I really wouldn't use it > for anything but fungi). > > Make sure you are also giving sufficient protein evidence. Perhaps all > proteins from chicken and pigeon for example. Then you shouldn't find > loss of any true genes if just using Augustus. Also try not to use gene > count as an indicator of performance. The value is very deceptive, > especially if the genome assembly is fragmented. > > Thanks, > Carson > > > > On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: > > >I have been running MAKER 2.31 using Augustus and SNAP on an avian > >genome. Augustus gives pretty decent gene model predictions based on a > >custom model we have and the hints MAKER provides. However, SNAP seems > >to throw out a ton of false positives; in many cases this appears to > >cause erroneous gene fusions. Leaving out SNAP altogether however leads > >to a marked decrease in # models overall, which is worse. GeneMark had a > >very similar problem (high # false positives) and thus no marked > >improvement, either when using with both Augustus and SNAP or with > >Augustus alone. > > > >I have been exploring using geneid > >(http://genome.crg.es/software/geneid/) as an alternative, based on some > >feedback on another project I worked with int he past. This would be > >feed into MAKER using external GFF, but I wanted to see if anyone has > >tried geneid with MAKER first. > > > >Finally, how hard would it be to incorporate alternative callers into > >MAKER? For instance, would it be possible to add these like a 'plugin'? > > > >chris > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 10 23:13:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 22:13:43 -0600 Subject: [maker-devel] Long introns from Augustus In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> References: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> Message-ID: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> Maybe. The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes. --Carson Sent from my iPhone > On Mar 6, 2014, at 5:48 PM, Shane Brubaker wrote: > > Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. > > I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. > > > -----Original Message----- > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Thursday, March 06, 2014 3:47 PM > To: Shane Brubaker; maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Long introns from Augustus > > Are these the ab intio calls that are merged or final MAKER models. > > ?Carson > > >> On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >> >> Hi, we have a very compact genome and we are getting a lot of fused >> gene models from running Augustus. I am wondering if anyone has any >> advice about how to prevent introns above a certain cutoff from being created? >> >> I tried a couple of things, some settings in a probabilities file and >> also changing a long list of probabilities to another file that someone >> had suggested on a forum. So far I don't really see any changes though. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From darasappan at gmail.com Mon Mar 10 15:14:03 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 10 Mar 2014 15:14:03 -0500 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing Message-ID: Hello, I've been running maker with different assembly files, reference files etc and I check the output by: 1. concatenating the gff files 2. concatenating the *transcripts.fasta files 3. concatenating the *proteins.fasta files I'm noticing that when I ran maker twice with same parameters, the second time around, many of the output subdirectories do not have a *transcripts.fasta or *proteins.fasta file in it. There are 251 subdirectories and only 97 of them have all 3 output files. Maker log looks ok to me, but I've attached it here as well. What could be the reason for this? Thanks dhivya -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.o1813247.gz Type: application/x-gzip Size: 13857217 bytes Desc: not available URL: -------------- next part -------------- From sbrubaker at solazyme.com Tue Mar 11 12:06:57 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Tue, 11 Mar 2014 17:06:57 +0000 Subject: [maker-devel] Long introns from Augustus In-Reply-To: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> References: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08FB3@EXCHANGE-MB01.internal.solazyme.com> Ok thank you. -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Monday, March 10, 2014 9:14 PM To: Shane Brubaker Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Long introns from Augustus Maybe. The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes. --Carson Sent from my iPhone > On Mar 6, 2014, at 5:48 PM, Shane Brubaker wrote: > > Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. > > I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. > > > -----Original Message----- > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Thursday, March 06, 2014 3:47 PM > To: Shane Brubaker; maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Long introns from Augustus > > Are these the ab intio calls that are merged or final MAKER models. > > ?Carson > > >> On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >> >> Hi, we have a very compact genome and we are getting a lot of fused >> gene models from running Augustus. I am wondering if anyone has any >> advice about how to prevent introns above a certain cutoff from being created? >> >> I tried a couple of things, some settings in a probabilities file and >> also changing a long list of probabilities to another file that >> someone had suggested on a forum. So far I don't really see any changes though. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >> rg > > From carson.holt at genetics.utah.edu Thu Mar 13 11:00:06 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 13 Mar 2014 16:00:06 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: Just resending this to the correct maker-devel address. Please when replying, do not CC the incorrect maker-devel-bounce address. Thanks, Carson On 3/13/14, 9:56 AM, "Carson Holt" wrote: >FGENESH is not a heavily used tool, so depending on which version it is >(either too old or too new), output might be slightly different which >could cause incorrect parsing. Could you tar up your maker.output folder, >and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >(send me either your user/guest ID after you upload). > >For the BLAST error, use BLAST+ instead. You are using blastall which is >the old legacy version of NCBI BLAST. You can do this by setting the >blast type in maker_bopts.ctl and the location of executables in >maker_exe.ctl. > >Thanks, >Carson > > > >On 3/12/14, 11:58 AM, "Borhan, Hossein" wrote: > >>Dear Maker users >> >> >>I ran maker (2.31) on a fungal genome and found out that it inserted the >>word SCLAR followed by a pair of bracket like this (0x22de7020) >>inserted in the nucleotide sequence of some of the genes. This seems to >>be related to transcripts predicted by fgenesh_masked. >> >> >>Here is an example for one of the genes >> >> >>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>offset:0 AE >>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>TTTCGACAAGC >> >>The same genome sequence was used for the first round of maker (2.10) >>without such problem. I checked the sequence for the scaffold related to >>one of the affected transcripts and there was no error in the sequence. >>I am not sure what is causing this. The only error that I could spot in >>the output error file is the following >> >> >>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>contexts/frames of query sequences. >> >> >> >>Your help is appreciated >> >> >> >>HB >> >> >> >> >> >> > From carsonhh at gmail.com Thu Mar 13 11:14:54 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 10:14:54 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> Message-ID: Note protein/transcript fasts are only created when there are gene models to output to those files (so their absence means there were no gene models for that contig). Most sequences without protein/transcript fasts in your sample are very short and thus don?t contain anything. What is left either have no est2genome results or the est2genome alignments do not have sufficient open reading frame to be turned into a gene model (false merging of regions by trinity can cause this, so make sure you use the jaccard index option when assembling reads with trinity to avoid this). You are using only the est2genome=1 option. This will result in a limited set of genes that can be used for training SNAP/Augustus (so not getting results on all contigs is expected). You really won?t get much as far as results until you have one of the ab initio predictors turned on. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 11, 2014 at 8:52 AM To: Carson Holt Cc: Daniel Ence Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing Alright done. My username is daras Thanks Dhivya On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > Input and compressed file of output. > > Thanks, > Carson > > From: dhivya arasappan > Date: Monday, March 10, 2014 at 2:09 PM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Hi Carson, > > Do you mean the whole maker output? > > Thanks > dhivya > > On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: > >> Could you upload everything here ?> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> Than send us the link generated or your user ID. >> >> Thanks, >> Carson >> >> >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 1:50 PM >> To: Carson Holt , Daniel Ence >> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson and Daniel, >> >> I'm sending this across to you separately since maker list is blocking my >> email due to attachment size. >> >> As always, thanks for any guidance you can provide. >> Dhivya >> >> >> Begin forwarded message: >> >>> From: dhivya arasappan >>> Date: March 10, 2014 3:14:03 PM CDT >>> To: maker-devel at yandell-lab.org >>> Subject: maker output- transcripts.fasta and proteins.fasta files missing >>> >>> >>> Hello, >>> >>> I've been running maker with different assembly files, reference files etc >>> and I check the output by: >>> >>> 1. concatenating the gff files >>> 2. concatenating the *transcripts.fasta files >>> 3. concatenating the *proteins.fasta files >>> >>> I'm noticing that when I ran maker twice with same parameters, the second >>> time around, many of the output subdirectories do not have a >>> *transcripts.fasta or *proteins.fasta file in it. >>> There are 251 subdirectories and only 97 of them have all 3 output files. >>> Maker log looks ok to me, but I've attached it here as well. >>> >>> What could be the reason for this? >>> >>> Thanks >>> dhivya >>> >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 13 11:55:40 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 10:55:40 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> Message-ID: The second time, it should have just started where it left off, so it would run faster (because the processing from the previous job counted towards the second one). The archived output you sent me had 21,183 proteins and transcripts. If you are using the fasta_merge to collect them, just make sure the datastore.index file is not truncated or corrupt otherwise it won?t collect all the fastas from every contig. You can rebuild the datastore.index using the -dsindex flag with MAKER, if you want to check that. Also you can have maker just regenerate results without rerunning BLAST etc., by using the -a flag if you want to just recalculate ll results quickly (rebuilds all FASTA and GFF3 without redoing most analysis). ?Carson From: dhivya arasappan Date: Thursday, March 13, 2014 at 10:47 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing Thanks Carson for the response. I understand that est2genome=1 does not use any ab initio gene predictions, but simply identifies ests based on alignment. I'm a little confused because I ran maker on my assembly before, using the same parameters ( including est2genome=1). I got a very good result with > 20,000 transcripts and proteins. Then I was able to get an improved assembly, where many scaffolds were combined into superscaffolds. So I reran maker on this assembly. Same parameters, same transcriptome and proteins files. Now, I see such drastically different results: Only 500+ genes and transcripts. My scaffolds are now bigger than before, so I'm not sure how this is happening. These were the results I sent you. Another odd thing I noticed (and I am hesitant to report this because perhaps it is due to some sort of error on my part): I ran maker on the improved assembly the first time and maker did not complete in the 48 hours I allocated. But I had 19,000+ transcripts in the unfinished output. When I reran maker, just changing the time allocated, it completed much faster, but is giving much fewer transcripts and proteins as output. Could something like this happen? If not, then I'm guessing I must have changed something although I'm pretty sure that I did not change anything other than the time allocated. I've attached the trascripts and proteins files from the first time I ran maker on my improved assembly. Thanks again for your help Dhivya On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: > Note protein/transcript fasts are only created when there are gene models to > output to those files (so their absence means there were no gene models for > that contig). Most sequences without protein/transcript fasts in your sample > are very short and thus don?t contain anything. What is left either have no > est2genome results or the est2genome alignments do not have sufficient open > reading frame to be turned into a gene model (false merging of regions by > trinity can cause this, so make sure you use the jaccard index option when > assembling reads with trinity to avoid this). > > You are using only the est2genome=1 option. This will result in a limited set > of genes that can be used for training SNAP/Augustus (so not getting results > on all contigs is expected). You really won?t get much as far as results > until you have one of the ab initio predictors turned on. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Tuesday, March 11, 2014 at 8:52 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Alright done. My username is daras > > Thanks > Dhivya > > On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > >> Input and compressed file of output. >> >> Thanks, >> Carson >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 2:09 PM >> To: Carson Holt >> Cc: Daniel Ence >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson, >> >> Do you mean the whole maker output? >> >> Thanks >> dhivya >> >> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >> >>> Could you upload everything here ?> >>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Than send us the link generated or your user ID. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: dhivya arasappan >>> Date: Monday, March 10, 2014 at 1:50 PM >>> To: Carson Holt , Daniel Ence >>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >>> missing >>> >>> Hi Carson and Daniel, >>> >>> I'm sending this across to you separately since maker list is blocking my >>> email due to attachment size. >>> >>> As always, thanks for any guidance you can provide. >>> Dhivya >>> >>> >>> Begin forwarded message: >>> >>>> From: dhivya arasappan >>>> Date: March 10, 2014 3:14:03 PM CDT >>>> To: maker-devel at yandell-lab.org >>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing >>>> >>>> >>>> Hello, >>>> >>>> I've been running maker with different assembly files, reference files etc >>>> and I check the output by: >>>> >>>> 1. concatenating the gff files >>>> 2. concatenating the *transcripts.fasta files >>>> 3. concatenating the *proteins.fasta files >>>> >>>> I'm noticing that when I ran maker twice with same parameters, the second >>>> time around, many of the output subdirectories do not have a >>>> *transcripts.fasta or *proteins.fasta file in it. >>>> There are 251 subdirectories and only 97 of them have all 3 output files. >>>> Maker log looks ok to me, but I've attached it here as well. >>>> >>>> What could be the reason for this? >>>> >>>> Thanks >>>> dhivya >>>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Mar 13 11:47:25 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 13 Mar 2014 11:47:25 -0500 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> Message-ID: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> Thanks Carson for the response. I understand that est2genome=1 does not use any ab initio gene predictions, but simply identifies ests based on alignment. I'm a little confused because I ran maker on my assembly before, using the same parameters ( including est2genome=1). I got a very good result with > 20,000 transcripts and proteins. Then I was able to get an improved assembly, where many scaffolds were combined into superscaffolds. So I reran maker on this assembly. Same parameters, same transcriptome and proteins files. Now, I see such drastically different results: Only 500+ genes and transcripts. My scaffolds are now bigger than before, so I'm not sure how this is happening. These were the results I sent you. Another odd thing I noticed (and I am hesitant to report this because perhaps it is due to some sort of error on my part): I ran maker on the improved assembly the first time and maker did not complete in the 48 hours I allocated. But I had 19,000+ transcripts in the unfinished output. When I reran maker, just changing the time allocated, it completed much faster, but is giving much fewer transcripts and proteins as output. Could something like this happen? If not, then I'm guessing I must have changed something although I'm pretty sure that I did not change anything other than the time allocated. I've attached the trascripts and proteins files from the first time I ran maker on my improved assembly. Thanks again for your help Dhivya On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: > Note protein/transcript fasts are only created when there are gene > models to output to those files (so their absence means there were > no gene models for that contig). Most sequences without protein/ > transcript fasts in your sample are very short and thus don?t > contain anything. What is left either have no est2genome results or > the est2genome alignments do not have sufficient open reading frame > to be turned into a gene model (false merging of regions by trinity > can cause this, so make sure you use the jaccard index option when > assembling reads with trinity to avoid this). > > You are using only the est2genome=1 option. This will result in a > limited set of genes that can be used for training SNAP/Augustus (so > not getting results on all contigs is expected). You really won?t > get much as far as results until you have one of the ab initio > predictors turned on. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Tuesday, March 11, 2014 at 8:52 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta > files missing > > Alright done. My username is daras > > Thanks > Dhivya > > On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > >> Input and compressed file of output. >> >> Thanks, >> Carson >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 2:09 PM >> To: Carson Holt >> Cc: Daniel Ence >> Subject: Re: maker output- transcripts.fasta and proteins.fasta >> files missing >> >> Hi Carson, >> >> Do you mean the whole maker output? >> >> Thanks >> dhivya >> >> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >> >>> Could you upload everything here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Than send us the link generated or your user ID. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: dhivya arasappan >>> Date: Monday, March 10, 2014 at 1:50 PM >>> To: Carson Holt , Daniel Ence >> > >>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta >>> files missing >>> >>> Hi Carson and Daniel, >>> >>> I'm sending this across to you separately since maker list is >>> blocking my email due to attachment size. >>> >>> As always, thanks for any guidance you can provide. >>> Dhivya >>> >>> >>> Begin forwarded message: >>> >>>> From: dhivya arasappan >>>> Date: March 10, 2014 3:14:03 PM CDT >>>> To: maker-devel at yandell-lab.org >>>> Subject: maker output- transcripts.fasta and proteins.fasta files >>>> missing >>>> >>>> Hello, >>>> >>>> I've been running maker with different assembly files, reference >>>> files etc and I check the output by: >>>> >>>> 1. concatenating the gff files >>>> 2. concatenating the *transcripts.fasta files >>>> 3. concatenating the *proteins.fasta files >>>> >>>> I'm noticing that when I ran maker twice with same parameters, >>>> the second time around, many of the output subdirectories do not >>>> have a *transcripts.fasta or *proteins.fasta file in it. >>>> There are 251 subdirectories and only 97 of them have all 3 >>>> output files. Maker log looks ok to me, but I've attached it >>>> here as well. >>>> >>>> What could be the reason for this? >>>> >>>> Thanks >>>> dhivya >>>> >>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: transcripts.cat.fasta.old.gz Type: application/x-gzip Size: 7927581 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proteins.cat.fasta.old.gz Type: application/x-gzip Size: 3668381 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 13 13:53:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 12:53:05 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> <672A27A2-FFBD-45EC-9303-E3973EEA5AB6@gmail.com> <5EE3B5E8-E7DC-4F09-B52D-E08CA4D85A15@gmail.com> Message-ID: For future reference, I suggest using the ?/maker/bin/fasta_merge tool to merge based on the datastore.index rather than other command line based methods. It will handle the multiple fasta types that are produced in the results, and will validate with the datastore.index file. Example: fasta_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log The same is also true when merging gff3 files. gff3_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log Thanks, Carson From: dhivya arasappan Date: Thursday, March 13, 2014 at 12:48 PM To: Carson Holt Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing ah I forgot that some were called superscaffolds. That is a difference between the old and new assembly. This was definitely the issue. Thanks and sorry for the mix up. Dhivya On Mar 13, 2014, at 12:51 PM, Carson Holt wrote: > Note that your command does not capture everything because not all scaffolds > start with the name ?scaffold". > > This works though ?> > ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/*/*trans*fasta|wc -l > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, March 13, 2014 at 11:34 AM > To: Carson Holt > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Hi Carson, > > Am I looking in the wrong place for my fasta files? I looked here: > > ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|wc > -l > > I see only 97 such files- so 97 contigs with transcripts.fasta files? > > When I count the number of sequences in all these files, I get 514 sequences. > > grep -c '^>' > opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|cut -d ':' > -f 2|awk '{total+=$0}END{print total}' > > Could you tell how and where you are getting the 21,183 transcripts? > > thanks > dhivya > > On Mar 13, 2014, at 12:21 PM, Carson Holt wrote: > >> This is what I see in your uploaded data. There are 21,183 transcripts from >> 201 contigs. Then there are 707 contigs with no gene models. >> >> ?Carson >> >> >> From: Carson Holt >> Date: Thursday, March 13, 2014 at 11:11 AM >> To: dhivya arasappan >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> "as you saw from the output I uploaded before, the output certainly was much >> less than 20,000 transcripts? >> >> Actually there were 21,183 in the output you uploaded. I saw no loss of >> entries. >> >> ?Carson >> >> From: dhivya arasappan >> Date: Thursday, March 13, 2014 at 11:09 AM >> To: Carson Holt >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson, >> >> The datastore.index file looks fine- it has a started and finished status for >> my 980 scaffolds. I reran with increased time twice. Second time around, I >> actually deleted the entire output directory to make sure it runs all over >> again. It still seemed to complete within a day. As you saw from the output >> I uploaded before, the output certainly was much less than 20,000 >> transcripts. Given that I was seeing great results for an older version of my >> assembly, I'm puzzled as to why my results are worse this time around. Any >> suggestions of what to check or what I can do to see improved results would >> be really helpful. >> >> I do know that I went from ~4% gaps to ~6% gaps in my new assembly- other >> than that, its better in every way. Could this cause just a dramatic >> difference in results? >> >> Thanks >> dhivya >> >> On Mar 13, 2014, at 11:55 AM, Carson Holt wrote: >> >>> The second time, it should have just started where it left off, so it would >>> run faster (because the processing from the previous job counted towards the >>> second one). The archived output you sent me had 21,183 proteins and >>> transcripts. If you are using the fasta_merge to collect them, just make >>> sure the datastore.index file is not truncated or corrupt otherwise it won?t >>> collect all the fastas from every contig. You can rebuild the >>> datastore.index using the -dsindex flag with MAKER, if you want to check >>> that. Also you can have maker just regenerate results without rerunning >>> BLAST etc., by using the -a flag if you want to just recalculate ll results >>> quickly (rebuilds all FASTA and GFF3 without redoing most analysis). >>> >>> ?Carson >>> >>> >>> From: dhivya arasappan >>> Date: Thursday, March 13, 2014 at 10:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >>> >>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>> missing >>> >>> Thanks Carson for the response. I understand that est2genome=1 does not use >>> any ab initio gene predictions, but simply identifies ests based on >>> alignment. I'm a little confused because I ran maker on my assembly before, >>> using the same parameters ( including est2genome=1). I got a very good >>> result with > 20,000 transcripts and proteins. >>> >>> Then I was able to get an improved assembly, where many scaffolds were >>> combined into superscaffolds. So I reran maker on this assembly. Same >>> parameters, same transcriptome and proteins files. Now, I see such >>> drastically different results: Only 500+ genes and transcripts. My >>> scaffolds are now bigger than before, so I'm not sure how this is happening. >>> These were the results I sent you. >>> >>> Another odd thing I noticed (and I am hesitant to report this because >>> perhaps it is due to some sort of error on my part): I ran maker on the >>> improved assembly the first time and maker did not complete in the 48 hours >>> I allocated. But I had 19,000+ transcripts in the unfinished output. When >>> I reran maker, just changing the time allocated, it completed much faster, >>> but is giving much fewer transcripts and proteins as output. Could >>> something like this happen? If not, then I'm guessing I must have changed >>> something although I'm pretty sure that I did not change anything other than >>> the time allocated. I've attached the trascripts and proteins files from the >>> first time I ran maker on my improved assembly. >>> >>> Thanks again for your help >>> Dhivya >>> >>> >>> >>> On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: >>> >>>> Note protein/transcript fasts are only created when there are gene models >>>> to output to those files (so their absence means there were no gene models >>>> for that contig). Most sequences without protein/transcript fasts in your >>>> sample are very short and thus don?t contain anything. What is left either >>>> have no est2genome results or the est2genome alignments do not have >>>> sufficient open reading frame to be turned into a gene model (false merging >>>> of regions by trinity can cause this, so make sure you use the jaccard >>>> index option when assembling reads with trinity to avoid this). >>>> >>>> You are using only the est2genome=1 option. This will result in a limited >>>> set of genes that can be used for training SNAP/Augustus (so not getting >>>> results on all contigs is expected). You really won?t get much as far as >>>> results until you have one of the ab initio predictors turned on. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> From: dhivya arasappan >>>> Date: Tuesday, March 11, 2014 at 8:52 AM >>>> To: Carson Holt >>>> Cc: Daniel Ence >>>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>>> missing >>>> >>>> Alright done. My username is daras >>>> >>>> Thanks >>>> Dhivya >>>> >>>> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: >>>> >>>>> Input and compressed file of output. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: dhivya arasappan >>>>> Date: Monday, March 10, 2014 at 2:09 PM >>>>> To: Carson Holt >>>>> Cc: Daniel Ence >>>>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>>>> missing >>>>> >>>>> Hi Carson, >>>>> >>>>> Do you mean the whole maker output? >>>>> >>>>> Thanks >>>>> dhivya >>>>> >>>>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >>>>> >>>>>> Could you upload everything here ?> >>>>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>>> >>>>>> Than send us the link generated or your user ID. >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> >>>>>> From: dhivya arasappan >>>>>> Date: Monday, March 10, 2014 at 1:50 PM >>>>>> To: Carson Holt , Daniel Ence >>>>>> >>>>>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >>>>>> missing >>>>>> >>>>>> Hi Carson and Daniel, >>>>>> >>>>>> I'm sending this across to you separately since maker list is blocking my >>>>>> email due to attachment size. >>>>>> >>>>>> As always, thanks for any guidance you can provide. >>>>>> Dhivya >>>>>> >>>>>> >>>>>> Begin forwarded message: >>>>>> >>>>>>> From: dhivya arasappan >>>>>>> Date: March 10, 2014 3:14:03 PM CDT >>>>>>> To: maker-devel at yandell-lab.org >>>>>>> Subject: maker output- transcripts.fasta and proteins.fasta files >>>>>>> missing >>>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I've been running maker with different assembly files, reference files >>>>>>> etc and I check the output by: >>>>>>> >>>>>>> 1. concatenating the gff files >>>>>>> 2. concatenating the *transcripts.fasta files >>>>>>> 3. concatenating the *proteins.fasta files >>>>>>> >>>>>>> I'm noticing that when I ran maker twice with same parameters, the >>>>>>> second time around, many of the output subdirectories do not have a >>>>>>> *transcripts.fasta or *proteins.fasta file in it. >>>>>>> There are 251 subdirectories and only 97 of them have all 3 output >>>>>>> files. Maker log looks ok to me, but I've attached it here as well. >>>>>>> >>>>>>> What could be the reason for this? >>>>>>> >>>>>>> Thanks >>>>>>> dhivya >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Mar 13 16:04:23 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 13 Mar 2014 21:04:23 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas > wrote: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt > wrote: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 14 11:06:26 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 14 Mar 2014 09:06:26 -0700 Subject: [maker-devel] associating gene names between related strains Message-ID: Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Mar 14 12:32:02 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 14 Mar 2014 17:32:02 +0000 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species. Let me know if that's not what you're looking for, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu] Sent: Friday, March 14, 2014 10:06 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] associating gene names between related strains Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 14 13:01:16 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 14 Mar 2014 11:01:16 -0700 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: I will try it today. Thanks for the quick reply! On Fri, Mar 14, 2014 at 10:32 AM, Daniel Ence wrote: > Hi Janna, So do you have one strain that you want to use as the > reference for all the others? There's a script that comes with MAKER called > maker_map_ids that lets you use a common prefix or suffix for entries in a > fasta file from one strain and then use est_forward to use that ID in the > gene models for the other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > Janna Fierst [jfierst at uoregon.edu] > *Sent:* Friday, March 14, 2014 10:06 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate > the gene naming so that orthologs between species can be associated by > name. I have been playing around a little with the est_forward option but > can't figure out a good system/workflow that preserves names but still uses > the strain-specific RNA-Seq EST set for the actual gene models. Thanks! > -Janna > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 13:02:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 12:02:48 -0600 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to know which genes you want to translate names to (two column input file, column 1 -> original ID, column 2 -> new ID). I?m not sure EST forward is the best way to do this, although I do think maker_map_ids is the tool to use in the end. The question is how to make a list of IDs to translate as the input to maker_map_ids? I would actually just use BLASTP against the reference strain, and then do reciprocal best BLAST hits. To do this you BLAST your reference proteins against your maker proteins. Then do the opposite, BLAST your maker proteins against your reference proteins. If they are both each others best hit, then they are orthologous, and you can safely make a two column entry for the maker_map_ids input (i.e. maker-gene-1 translates into smug1). ?Carson From: Daniel Ence Date: Friday, March 14, 2014 at 11:32 AM To: Janna Fierst , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] associating gene names between related strains Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species. Let me know if that's not what you're looking for, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu] Sent: Friday, March 14, 2014 10:06 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] associating gene names between related strains Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 13:43:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 12:43:41 -0600 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com> References: <3219E92A-2024-45C6-84A9-66C646287D7E@gmail.com> <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com> Message-ID: I?m glad you were able to fix it. I?ll check to see why it was failing as well. Thanks, Carson From: dhivya arasappan Date: Friday, March 14, 2014 at 10:16 AM To: Carson Holt Subject: Re: Error when running maker2zff script Kindly ignore my previous question. I was able to manipulate the scaffold names in the gff file to get maker2zff to work. Thanks dhivya On Mar 14, 2014, at 10:55 AM, dhivya arasappan wrote: > My message got flagged by the maker list again, so I?m forwarding this > separately to you. Is there a better way to send biggish files? > > > Thank you > Dhivya > > > > Begin forwarded message: > >> From: dhivya arasappan >> Subject: Error when running maker2zff script >> Date: March 13, 2014 at 8:35:27 PM CDT >> To: Carson Holt , maker-devel at yandell-lab.org >> >> Hi Carson, >> >> I used gff3_merge to create my gff file from maker output. I've attached it >> here. But when I run maker2zff on it, I get the following error: >> >> Can't use an undefined value as an ARRAY reference at >> /opt/apps/maker/2.30/bin/maker2zff line 177, line 7294251. >> >> It produces an incomplete output file and it looks like it may be running >> into problems when it encounters scaffold3%2F0. I'm wondering if its having >> problems with my scaffold names. There seem to be some inconsistencies >> because it's referred to as scaffold3%F0 and scaffold3/0 in the gff file. >> It goes through other scaffolds like SCAFFOLD3_873, SCAFFOLD3_95 etc just >> fine. I did try replacing the scaffold names in the gff file, but still get >> the same error. Any ideas? >> >> Substitution command I used, for your reference: sed 's/3\%2F/3_/g' gfffile| >> sed 's/\//\_/' > mod.gfffile >> >> Thanks >> Dhivya >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 14:25:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 13:25:58 -0600 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: We can look into it. ?Carson From: "Fields, Christopher J" Date: Thursday, March 13, 2014 at 3:04 PM To: Sajeet Haridas Cc: Carson Holt , " List" Subject: Re: [maker-devel] geneid (or alternative ab initio predictors) That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas wrote: > One of the problems I have found with genemark is that it does not understand > a soft-masked genome. Hence, the self training is incorrect. I have found > marked improvement to genemark's prediction by running the training on a hard > masked genome. > > > On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt wrote: >> Adding a new predictor can take some time. It obviously requires some >> coding. It?s usually not too hard just to convert results to GFF3 and >> then pass it in. Integrated support is really only beneficial for >> predictors that can take ?hints? from evidence alignments (for example we >> are working on EVM integration right now - >> http://evidencemodeler.sourceforge.net >> ). If SNAP and GeneMark give >> problems just drop them. GeneMark really doesn?t work very good on >> genomes with complex intron/exon structure (and I really wouldn?t use it >> for anything but fungi). >> >> Make sure you are also giving sufficient protein evidence. Perhaps all >> proteins from chicken and pigeon for example. Then you shouldn?t find >> loss of any true genes if just using Augustus. Also try not to use gene >> count as an indicator of performance. The value is very deceptive, >> especially if the genome assembly is fragmented. >> >> Thanks, >> Carson >> >> >> >> On 3/10/14, 8:52 AM, "Fields, Christopher J" wrote: >> >>> >I have been running MAKER 2.31 using Augustus and SNAP on an avian >>> >genome. Augustus gives pretty decent gene model predictions based on a >>> >custom model we have and the hints MAKER provides. However, SNAP seems >>> >to throw out a ton of false positives; in many cases this appears to >>> >cause erroneous gene fusions. Leaving out SNAP altogether however leads >>> >to a marked decrease in # models overall, which is worse. GeneMark had a >>> >very similar problem (high # false positives) and thus no marked >>> >improvement, either when using with both Augustus and SNAP or with >>> >Augustus alone. >>> > >>> >I have been exploring using geneid >>> >(http://genome.crg.es/software/geneid/) as an alternative, based on some >>> >feedback on another project I worked with int he past. This would be >>> >feed into MAKER using external GFF, but I wanted to see if anyone has >>> >tried geneid with MAKER first. >>> > >>> >Finally, how hard would it be to incorporate alternative callers into >>> >MAKER? For instance, would it be possible to add these like a ?plugin?? >>> > >>> >chris >>> >_______________________________________________ >>> >maker-devel mailing list >>> >maker-devel at box290.bluehost.com >>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri Mar 14 21:22:55 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sat, 15 Mar 2014 02:22:55 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: <53FD788A-15EA-4A18-BB2F-3072178816CA@illinois.edu> Not an issue at the moment; I?ll likely supply these via gff for now. If needed I can work off a svn checkout and send along a patch should I ever manage to eek out time to work on it. chris On Mar 14, 2014, at 2:25 PM, Carson Holt > wrote: We can look into it. ?Carson From: "Fields, Christopher J" > Date: Thursday, March 13, 2014 at 3:04 PM To: Sajeet Haridas > Cc: Carson Holt >, "> List" > Subject: Re: [maker-devel] geneid (or alternative ab initio predictors) That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas > wrote: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt > wrote: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Mon Mar 17 14:45:15 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 17 Mar 2014 19:45:15 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: I have attached 4 files for you to place in the .../maker/Widgets/ directory. The *blast.pm files will suppress the BLAST+ failures you are getting (alternatively you can just downgrade to BLAST 2.27 to get the same effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. In the latest release NCBI redid all their warnings and error codes so it spits out a lot of garbage and fails with different messages than it did before. For example BLAST now warns you every time it encounter a fasta header with a comment (virtually every fasta entry in existence falls in this category), so your screen will be awash with meaningless warning messages. The fgenesh.pm file will fix the other failure, which only occurs if you use fgenesh simultaneously with the est_fustion=1 option. No other predictors are affected. Thanks, Carson On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: >Dear Carson > >Sorry for the late reply. I was away for a couple of days. I have uploaded >the out put files plus control and error output on the FTP site that you >provided >The user ID is borhanh > >I used blast+ for this run. > > > > >Regards > > >HB > > > > > > > > >On 14-03-13 10:00 AM, "Carson Holt" wrote: > >>Just resending this to the correct maker-devel address. Please when >>replying, do not CC the incorrect maker-devel-bounce address. >> >>Thanks, >>Carson >> >> >>On 3/13/14, 9:56 AM, "Carson Holt" wrote: >> >>>FGENESH is not a heavily used tool, so depending on which version it is >>>(either too old or too new), output might be slightly different which >>>could cause incorrect parsing. Could you tar up your maker.output >>>folder, >>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>(send me either your user/guest ID after you upload). >>> >>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>is >>>the old legacy version of NCBI BLAST. You can do this by setting the >>>blast type in maker_bopts.ctl and the location of executables in >>>maker_exe.ctl. >>> >>>Thanks, >>>Carson >>> >>> >>> >>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Dear Maker users >>>> >>>> >>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>the >>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>inserted in the nucleotide sequence of some of the genes. This seems to >>>>be related to transcripts predicted by fgenesh_masked. >>>> >>>> >>>>Here is an example for one of the genes >>>> >>>> >>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>offset:0 AE >>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>TTTCGACAAGC >>>> >>>>The same genome sequence was used for the first round of maker (2.10) >>>>without such problem. I checked the sequence for the scaffold related >>>>to >>>>one of the affected transcripts and there was no error in the sequence. >>>>I am not sure what is causing this. The only error that I could spot in >>>>the output error file is the following >>>> >>>> >>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>contexts/frames of query sequences. >>>> >>>> >>>> >>>>Your help is appreciated >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: blastn.pm Type: text/x-perl-script Size: 8112 bytes Desc: blastn.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastx.pm Type: text/x-perl-script Size: 8218 bytes Desc: blastx.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fgenesh.pm Type: text/x-perl-script Size: 19744 bytes Desc: fgenesh.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tblastx.pm Type: text/x-perl-script Size: 9113 bytes Desc: tblastx.pm URL: From carsonhh at gmail.com Mon Mar 17 16:14:42 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Mar 2014 15:14:42 -0600 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: References: Message-ID: Just an update on this. I?ve fixed the maker2zff script to handle the issues seen. Looking at this actually brought to light another issue. There is inconsistent escape character specification for GFF3 in column 1 (the source ID), column 8 (the attributes ID and Target_ID), as well as the FASTA ID for internal sequence. We?re updating the GFF3 spec to clarify this so that everywhere you see the same ID getting treated the same way for character escaping. To be safe though, only use these characters in your contig IDs for the assembly when using any tool that reads or outputs GFF3 ?> a-zA-Z0-9.:^*$@!+_?-| Any character not in that set has a high chance of breaking some downstream tool. For now just assume the strict interpretation from the GFF3 spec for column 1, must be used on all IDs everywhere (see below). >>Column 1: ?seqid" >>The ID of the landmark used to establish the coordinate system for the >>current feature. >>IDs may contain any characters, but must escape any characters not in >>the set [a-zA-Z0-9.:^*$@!+_?-|]. >>In particular, IDs may not contain unescaped whitespace and must not >>begin with an unescaped ">". Thanks, Carson On 3/13/14, 7:35 PM, "dhivya arasappan" wrote: >Hi Carson, > >I used gff3_merge to create my gff file from maker output. I've >attached it here. But when I run maker2zff on it, I get the following >error: > >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/ >2.30/bin/maker2zff line 177, line 7294251. > >It produces an incomplete output file and it looks like it may be >running into problems when it encounters scaffold3%2F0. I'm wondering >if its having problems with my scaffold names. There seem to be some >inconsistencies because it's referred to as scaffold3%F0 and >scaffold3/0 in the gff file. It goes through other scaffolds like >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine. I did try replacing the >scaffold names in the gff file, but still get the same error. Any >ideas? > >Substitution command I used, for your reference: sed 's/3\%2F/3_/g' >gfffile| sed 's/\//\_/' > mod.gfffile > >Thanks >Dhivya > From darasappan at gmail.com Mon Mar 17 16:20:18 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 17 Mar 2014 16:20:18 -0500 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: References: Message-ID: Awesome! Thanks Carson. Dhivya On Mon, Mar 17, 2014 at 4:14 PM, Carson Holt wrote: > Just an update on this. I've fixed the maker2zff script to handle the > issues seen. Looking at this actually brought to light another issue. > There is inconsistent escape character specification for GFF3 in column 1 > (the source ID), column 8 (the attributes ID and Target_ID), as well as > the FASTA ID for internal sequence. We're updating the GFF3 spec to > clarify this so that everywhere you see the same ID getting treated the > same way for character escaping. > > To be safe though, only use these characters in your contig IDs for the > assembly when using any tool that reads or outputs GFF3 --> > a-zA-Z0-9.:^*$@!+_?-| > > Any character not in that set has a high chance of breaking some > downstream tool. For now just assume the strict interpretation from the > GFF3 spec for column 1, must be used on all IDs everywhere (see below). > > >>Column 1: "seqid" > >>The ID of the landmark used to establish the coordinate system for the > >>current feature. > >>IDs may contain any characters, but must escape any characters not in > >>the set [a-zA-Z0-9.:^*$@!+_?-|]. > >>In particular, IDs may not contain unescaped whitespace and must not > >>begin with an unescaped ">". > > > Thanks, > Carson > > > > On 3/13/14, 7:35 PM, "dhivya arasappan" wrote: > > >Hi Carson, > > > >I used gff3_merge to create my gff file from maker output. I've > >attached it here. But when I run maker2zff on it, I get the following > >error: > > > >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/ > >2.30/bin/maker2zff line 177, line 7294251. > > > >It produces an incomplete output file and it looks like it may be > >running into problems when it encounters scaffold3%2F0. I'm wondering > >if its having problems with my scaffold names. There seem to be some > >inconsistencies because it's referred to as scaffold3%F0 and > >scaffold3/0 in the gff file. It goes through other scaffolds like > >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine. I did try replacing the > >scaffold names in the gff file, but still get the same error. Any > >ideas? > > > >Substitution command I used, for your reference: sed 's/3\%2F/3_/g' > >gfffile| sed 's/\//\_/' > mod.gfffile > > > >Thanks > >Dhivya > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at bils.se Tue Mar 18 06:43:43 2014 From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Tue, 18 Mar 2014 12:43:43 +0100 Subject: [maker-devel] Maker changes 2.30-2.31 Message-ID: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> Hi, I have observed a few oddities with our installation of maker 2.31 and was therefore wondering if there is a change log somewhere to get some information on what, if anything, was changed between 2.30 and 2.31? There is of course a good chance that the issues I am seeing (pipeline locking up) are related to our setup and not necessarily Maker - but I?d like to make sure, if possible. Both versions use the exact same external binaries etc, and were run on the same data. 2.30 is running along happily, 2.31 however has randomly locked up. I should perhaps also say that I am running on SL 6.2 and am using mpich2 for the MPI run. I haven?t done any more systematic testing so far, but will probably do so if there is no ?obvious? reason why Maker 2.31 should behave differently.. Cheers, Marc Marc P. Hoeppner, PhD Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at bils.se From carsonhh at gmail.com Tue Mar 18 10:07:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 09:07:07 -0600 Subject: [maker-devel] Maker changes 2.30-2.31 In-Reply-To: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> References: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> Message-ID: Attached. Also make sure you are using the tar ball from the lab website and not the prerelease from the subversion repository. Thanks, Carson On 3/18/14, 5:43 AM, "Marc H?ppner" wrote: >Hi, > >I have observed a few oddities with our installation of maker 2.31 and >was therefore wondering if there is a change log somewhere to get some >information on what, if anything, was changed between 2.30 and 2.31? > >There is of course a good chance that the issues I am seeing (pipeline >locking up) are related to our setup and not necessarily Maker - but I?d >like to make sure, if possible. Both versions use the exact same external >binaries etc, and were run on the same data. 2.30 is running along >happily, 2.31 however has randomly locked up. I should perhaps also say >that I am running on SL 6.2 and am using mpich2 for the MPI run. > >I haven?t done any more systematic testing so far, but will probably do >so if there is no ?obvious? reason why Maker 2.31 should behave >differently.. > >Cheers, > >Marc > > > > >Marc P. Hoeppner, PhD >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at bils.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: svn_log.txt URL: From fbarreto at ucsd.edu Tue Mar 18 11:08:47 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Tue, 18 Mar 2014 09:08:47 -0700 Subject: [maker-devel] Size of initial EST training set for SNAP Message-ID: Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 11:14:29 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 10:14:29 -0600 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: That sounds good. 1,500 initial models should be more than sufficient for the first round of training. ?Carson From: Felipe Barreto Date: Tuesday, March 18, 2014 at 10:08 AM To: MAKER group Subject: [maker-devel] Size of initial EST training set for SNAP Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Mar 18 11:16:20 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 18 Mar 2014 16:16:20 +0000 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: Hi Felipe, I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training. The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence. Good luck and let us know if there's anything we can help with! Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu] Sent: Tuesday, March 18, 2014 10:08 AM To: MAKER group Subject: [maker-devel] Size of initial EST training set for SNAP Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue Mar 18 11:26:45 2014 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 18 Mar 2014 10:26:45 -0600 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> Hi Felipe, I think that plan sounds quite reasonable. To address your primary concern, most gene prediction tools recommend something in the range of a minimum of a few hundred gene models to train on. Since your an order of magnitude above that I think your in good shape. Having said that, of course if you have concerns about biases in your training set you may be able to supplement it further by using a tool like CEGMA (http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence genes that your set is missing. Since the final gene set will only be as complete as the gene predictions that MAKER has to choose from I would suggest that you also consider including at least one other gene predictor. Augustus works well on a wide variety of genomes and while it is more difficult to train than SNAP it does accept hints from MAKER and will likely add to the diversity of the final gene set, even if you choose to use an existing HMM that has some reasonable relationship to your genome. This is one of the advantages of MAKER supervision, while it would be best to train Augustus as well, MAKER will ensure that the final models are not too far out of line with the evidence and you'll likely see quite good results using a custom SNAP HMM and an existing Augustus HMM as predictor within MAKER. Thanks, B On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote: > Hi, all, > > I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. > > From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). > > I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). > > Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. > > Thanks for any insight! > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Tue Mar 18 11:59:39 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Tue, 18 Mar 2014 09:59:39 -0700 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> References: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> Message-ID: Thanks, guys, for the swift and informative response! I will try to train Augustus again, but at the very least, will include it with an arthropod HMM in my final run (in addition to my custom SNAP HMM). Cheers, Felipe On Tue, Mar 18, 2014 at 9:26 AM, Barry Moore wrote: > Hi Felipe, > > I think that plan sounds quite reasonable. To address your primary > concern, most gene prediction tools recommend something in the range of a > minimum of a few hundred gene models to train on. Since your an order of > magnitude above that I think your in good shape. Having said that, of > course if you have concerns about biases in your training set you may be > able to supplement it further by using a tool like CEGMA ( > http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence > genes that your set is missing. > > Since the final gene set will only be as complete as the gene predictions > that MAKER has to choose from I would suggest that you also consider > including at least one other gene predictor. Augustus works well on a wide > variety of genomes and while it is more difficult to train than SNAP it > does accept hints from MAKER and will likely add to the diversity of the > final gene set, even if you choose to use an existing HMM that has some > reasonable relationship to your genome. This is one of the advantages of > MAKER supervision, while it would be best to train Augustus as well, MAKER > will ensure that the final models are not too far out of line with the > evidence and you'll likely see quite good results using a custom SNAP HMM > and an existing Augustus HMM as predictor within MAKER. > > Thanks, > > B > > On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote: > > Hi, all, > > I've been learning a lot from reading posts from this group, and finally > started doing actual runs of Maker on our current genome assembly > (arthropod, genome size ~230Mb). I started by training SNAP, but would > like to check my approach before continuing with longer runs. > > From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I > deemed of very high quality based on blast alignments to Swiss-Prot (based > on query-subject coverage, bit score, etc). I then used only these 2000 > ESTs in a first Maker run using est2genome=1. The output returned 1500 > models (with the 500 "missing" models probably a result of single-exon > issues; not a concern at this point). > > I now plan on training SNAP with this first output, and then doing another > Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein > evidence, and 3) SNAP with the first HMM file. The output of this second > run will be used to re-train SNAP, and this second HMM file will be used in > a final "official" run (while continuing to provide the EST and protein > evidence, of course). > > Does this sound like a reasonable approach? Simply put, my main concern > is whether I'm using too few ESTs in my first est2genome step. > > Thanks for any insight! > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 18 14:27:11 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 18 Mar 2014 14:27:11 -0500 Subject: [maker-devel] maker snap output files Message-ID: Hello, I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. *maker.proteins.fasta *maker.snap_masked.proteins.fasta *maker.non_overlapping_ab_initio.proteins.fasta What is the difference among these? They all have different number of sequences. Similarly,with transcripts: maker.non_overlapping_ab_initio.transcripts.fasta maker.snap_masked.transcripts.fasta maker.transcripts.fasta Thanks Dhivya -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 14:34:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 13:34:05 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: References: Message-ID: maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 18, 2014 at 1:27 PM To: Carson Holt , Subject: maker snap output files Hello, I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. *maker.proteins.fasta *maker.snap_masked.proteins.fasta *maker.non_overlapping_ab_initio.proteins.fasta What is the difference among these? They all have different number of sequences. Similarly,with transcripts: maker.non_overlapping_ab_initio.transcripts.fasta maker.snap_masked.transcripts.fasta maker.transcripts.fasta Thanks Dhivya -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 18 15:05:39 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 18 Mar 2014 15:05:39 -0500 Subject: [maker-devel] maker snap output files In-Reply-To: References: Message-ID: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Thanks Carson. Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? The unfiltered proteins file has more proteins though. Thanks Dhivya On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) > maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) > maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. > > > These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. > > Thanks, > Carson > > > > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 1:27 PM > To: Carson Holt , > Subject: maker snap output files > > Hello, > > I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. > > *maker.proteins.fasta > *maker.snap_masked.proteins.fasta > *maker.non_overlapping_ab_initio.proteins.fasta > > What is the difference among these? They all have different number of sequences. > > Similarly,with transcripts: > > maker.non_overlapping_ab_initio.transcripts.fasta > maker.snap_masked.transcripts.fasta > maker.transcripts.fasta > > Thanks > Dhivya > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 15:09:01 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 14:09:01 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Message-ID: There can also be hint based predictions. They may be similar in size, but there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold). You should always review your annotations in something like Apollo, to see how the models compare to the evidence. Just counts don?t really mean anything. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 18, 2014 at 2:05 PM To: Carson Holt Cc: Subject: Re: maker snap output files Thanks Carson. Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? The unfiltered proteins file has more proteins though. Thanks Dhivya On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > maker.proteins.fasta - these are the final filtered and modified protein > models (this is what you want) > maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio > predictions (for reference purposes) > maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant > rejected models that do not overlap the maker.proteins.fasta entries. If you > think you are missing a gene, look for it here. Sometimes people use > interproscan (very slow) to analyze this file for false negatives. > > > These files are also described in the README distributed with MAKER in the > ?MAKER OUTPUT? section. > > Thanks, > Carson > > > > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 1:27 PM > To: Carson Holt , > Subject: maker snap output files > > Hello, > > I ran maker after running SNAP ab initio prediction (following instructions > from the maker tutorial). It ran successfully and when I ran fasta_merge, I > got several output fasta files. I?m unable to find information on the tutorial > about interpreting these different files. I?m hoping one of you can help. > > *maker.proteins.fasta > *maker.snap_masked.proteins.fasta > *maker.non_overlapping_ab_initio.proteins.fasta > > What is the difference among these? They all have different number of > sequences. > > Similarly,with transcripts: > > maker.non_overlapping_ab_initio.transcripts.fasta > maker.snap_masked.transcripts.fasta > maker.transcripts.fasta > > Thanks > Dhivya > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisbioinfo at gmail.com Wed Mar 19 06:09:57 2014 From: chrisbioinfo at gmail.com (Chris Bioinfo) Date: Wed, 19 Mar 2014 12:09:57 +0100 Subject: [maker-devel] Annotation with maker2 Message-ID: Hello, I'm installing/using maker2 for the first time and I have an error by using it. I certainly missing something, but I don't know what. I compile maker with no error message and I have all these directories after compilation: bin data GMOD INSTALL lib LICENSE MWAS perl README src Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have this error: STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm Can't call method "do" on an undefined value at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm --> rank=NA, hostname=belem ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 ... ideas? Best, Christelle -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 19 08:01:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Mar 2014 07:01:35 -0600 Subject: [maker-devel] Annotation with maker2 In-Reply-To: References: Message-ID: Your problem is one of the following. You need to reinstall the DBD::SQLite module, you are running in a directory you don?t have permissions for, you set your TMDIR environmental variable or TMP value in maker_opts.ctl to an NFS mounted or memory mounted directory, or you are using a self compiled version of Perl (I.e. not /usr/bin/perl) that has issues (probably with DB or SQLite modules). You can also completely delete the output directory, and start again to see if it was just a random error. You should look at each of those first. You can also run MAKER with the --debug command line flag and send it to me if all of those seem not to be the issue. Thanks, Carson From: Chris Bioinfo Date: Wednesday, March 19, 2014 at 5:09 AM To: Subject: [maker-devel] Annotation with maker2 Hello, I'm installing/using maker2 for the first time and I have an error by using it. I certainly missing something, but I don't know what. I compile maker with no error message and I have all these directories after compilation: bin data GMOD INSTALL lib LICENSE MWAS perl README src Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have this error: STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm Can't call method "do" on an undefined value at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm --> rank=NA, hostname=belem ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 ... ideas? Best, Christelle _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbharris at uw.edu Wed Mar 19 20:19:27 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Wed, 19 Mar 2014 18:19:27 -0700 Subject: [maker-devel] tradeoff between run time & file number Message-ID: Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Mar 20 00:43:11 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 20 Mar 2014 05:43:11 +0000 Subject: [maker-devel] tradeoff between run time & file number In-Reply-To: References: Message-ID: Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end. Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp. Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu] Sent: Wednesday, March 19, 2014 7:19 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] tradeoff between run time & file number Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Mar 20 12:22:47 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 20 Mar 2014 12:22:47 -0500 Subject: [maker-devel] maker snap output files In-Reply-To: References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Message-ID: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> Hi Carson, Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file? The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene? so does this mean these are the genes from the snap prediction tool? Thanks dhivya On Mar 18, 2014, at 3:09 PM, Carson Holt wrote: > There can also be hint based predictions. They may be similar in size, but there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold). You should always review your annotations in something like Apollo, to see how the models compare to the evidence. Just counts don?t really mean anything. > > Thanks, > Carson > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 2:05 PM > To: Carson Holt > Cc: > Subject: Re: maker snap output files > > Thanks Carson. > > Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? > > The unfiltered proteins file has more proteins though. > > Thanks > Dhivya > > > > On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > >> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) >> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) >> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. >> >> >> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. >> >> Thanks, >> Carson >> >> >> >> >> From: dhivya arasappan >> Date: Tuesday, March 18, 2014 at 1:27 PM >> To: Carson Holt , >> Subject: maker snap output files >> >> Hello, >> >> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. >> >> *maker.proteins.fasta >> *maker.snap_masked.proteins.fasta >> *maker.non_overlapping_ab_initio.proteins.fasta >> >> What is the difference among these? They all have different number of sequences. >> >> Similarly,with transcripts: >> >> maker.non_overlapping_ab_initio.transcripts.fasta >> maker.snap_masked.transcripts.fasta >> maker.transcripts.fasta >> >> Thanks >> Dhivya >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 20 12:24:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 20 Mar 2014 11:24:41 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> Message-ID: maker transcripts will be the gene/mRNA/exon/CDS features All other transcripts from SNAP etc. will be match/match_part features in the GFF3. When you look at these in something like Apollo, they will be placed in different viewing panels based on their type. Thanks, Carson From: dhivya arasappan Date: Thursday, March 20, 2014 at 11:22 AM To: Carson Holt Cc: Subject: Re: maker snap output files Hi Carson, Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file? The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene? so does this mean these are the genes from the snap prediction tool? Thanks dhivya On Mar 18, 2014, at 3:09 PM, Carson Holt wrote: > There can also be hint based predictions. They may be similar in size, but > there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, > as gene predictors tend to over predict (as much as 10 fold). You should > always review your annotations in something like Apollo, to see how the models > compare to the evidence. Just counts don?t really mean anything. > > Thanks, > Carson > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 2:05 PM > To: Carson Holt > Cc: > Subject: Re: maker snap output files > > Thanks Carson. > > Is it normal that in my maker results after running snap, the number of > proteins (in *maker.proteins.fasta) Is actually less than the number of > proteins in my pre-snap maker results? I assumed that annotations through > alignment+annotation through prediction would equal more annotations? > > The unfiltered proteins file has more proteins though. > > Thanks > Dhivya > > > > On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > >> maker.proteins.fasta - these are the final filtered and modified protein >> models (this is what you want) >> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab >> initio predictions (for reference purposes) >> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant >> rejected models that do not overlap the maker.proteins.fasta entries. If you >> think you are missing a gene, look for it here. Sometimes people use >> interproscan (very slow) to analyze this file for false negatives. >> >> >> These files are also described in the README distributed with MAKER in the >> ?MAKER OUTPUT? section. >> >> Thanks, >> Carson >> >> >> >> >> From: dhivya arasappan >> Date: Tuesday, March 18, 2014 at 1:27 PM >> To: Carson Holt , >> Subject: maker snap output files >> >> Hello, >> >> I ran maker after running SNAP ab initio prediction (following instructions >> from the maker tutorial). It ran successfully and when I ran fasta_merge, I >> got several output fasta files. I?m unable to find information on the >> tutorial about interpreting these different files. I?m hoping one of you can >> help. >> >> *maker.proteins.fasta >> *maker.snap_masked.proteins.fasta >> *maker.non_overlapping_ab_initio.proteins.fasta >> >> What is the difference among these? They all have different number of >> sequences. >> >> Similarly,with transcripts: >> >> maker.non_overlapping_ab_initio.transcripts.fasta >> maker.snap_masked.transcripts.fasta >> maker.transcripts.fasta >> >> Thanks >> Dhivya >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 20 12:53:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 20 Mar 2014 11:53:24 -0600 Subject: [maker-devel] tradeoff between run time & file number In-Reply-To: References: Message-ID: You may also want to try the GFF3 pass_through options. Basically you give your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from your past run by setting the 'pass' options to 1. Then you can run without your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat masker species as well). The values will be passed forward from the GFF3 file into the current run. --Carson From: Daniel Ence Date: Wednesday, March 19, 2014 at 11:43 PM To: Rebecca Harris , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] tradeoff between run time & file number Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end. Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp. Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu] Sent: Wednesday, March 19, 2014 7:19 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] tradeoff between run time & file number Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 21 09:23:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 08:23:18 -0600 Subject: [maker-devel] Annotation with maker2 In-Reply-To: References: Message-ID: Glad it's working. Let us know if anything else comes up. --Carson From: Chris Bioinfo Date: Friday, March 21, 2014 at 4:57 AM To: Carson Holt Subject: Re: [maker-devel] Annotation with maker2 Dear Carson it works!! after many difficults : I have installed sqlite3.8.4.1 yesterday: it was """better"""" (no error message by launching sqlite3). Yet my test.db was not created.. Today I find the trick! the problem was due to my too long path to created the db .. only that... Thanks for your time and you help Carson! All the best, Christelle 2014-03-20 18:21 GMT+01:00 Carson Holt : > Also you can use this command line to test both before and after installing > > perl -MDBI -MDBD::SQLite -e 'print "$DBD::SQLite::sqlite_version\n"; $dbh = > DBI->connect("dbi:SQLite:dbname=/path/from/maker/error/dpp_contig.db","","");' > > Make sure to set /path/from/maker/error/dpp_contig.db to whatever its was in > the error. > > --Carson > > > From: Carson Holt > Date: Thursday, March 20, 2014 at 11:03 AM > To: Chris Bioinfo > > Subject: Re: [maker-devel] Annotation with maker2 > > The failure is in SQLite. So you have to reinstall. I.e. 'force install > DBD::SQLite' in CPAN. Otherwise you are just keeping whatever module is > installed which may have broken C bindings. > > You may also have to install SQLite 3.8.4.1, and then reinstall the perl > modules using the force option to force recompile. > > --Carson > > > > From: Chris Bioinfo > Date: Thursday, March 20, 2014 at 10:57 AM > To: Carson Holt > Subject: Re: [maker-devel] Annotation with maker2 > > cpan[2]> install DBI > DBI is up to date (1.631). > > cpan[3]> install DBD::SQLite > DBD::SQLite is up to date (1.42). > > my test.db is not created effectively: > > sqlite3 dpp_contig.maker.output/test.db > SQLite version 3.8.3.1 2014-02-11 14:52:19 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> > > > > > 2014-03-20 17:36 GMT+01:00 Carson Holt : >> I'm actually checking the mount points for the disk. SQLite won't work on >> filesystems that don't implement locks, and 'df' is a good way to infer some >> of that info. >> >> Basically I still think this is SQLlite failing on your system. You might >> need to reinstall SQLlite and then reinstall the perl DBI and DBD::SQLite >> modules. >> >> You can also do a test command --> 'sqllite3 dpp_contig.maker.output/test.db' >> >> This will work if you have sqllite3 installed. And any error it give may be >> informative. >> >> --Carson >> >> From: Chris Bioinfo >> Date: Thursday, March 20, 2014 at 10:29 AM >> >> To: Carson Holt >> Subject: Re: [maker-devel] Annotation with maker2 >> >> oh sorry >> >> my disks are quite full, but still space I guess for maker >> >> /dev/sdc1 19T 18T 934G 95% /home >> >> >> 2014-03-20 17:23 GMT+01:00 Chris Bioinfo : >>> this : >>> >>> du -h dpp_contig.maker.output/ >>> 0 >>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi >>> d.contig-dpp-500-500/0 >>> 88K >>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi >>> d.contig-dpp-500-500 >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500 >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05/1F >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05 >>> 92K dpp_contig.maker.output/dpp_contig_datastore >>> 4.0K dpp_contig.maker.output/dpp_contig_master_datastore_index.log >>> 4.0K dpp_contig.maker.output/maker_bopts.log >>> 4.0K dpp_contig.maker.output/maker_exe.log >>> 8.0K dpp_contig.maker.output/maker_opts.log >>> 16K dpp_contig.maker.output/mpi_blastdb/dpp_protein%2Efasta.mpi.1 >>> 44K dpp_contig.maker.output/mpi_blastdb/dpp_contig%2Efasta.mpi.1 >>> 14M dpp_contig.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10 >>> 32K dpp_contig.maker.output/mpi_blastdb/dpp_est%2Efasta.mpi.1 >>> 14M dpp_contig.maker.output/mpi_blastdb >>> 0 dpp_contig.maker.output/seen.dbm >>> >>> >>> >>> 2014-03-20 17:10 GMT+01:00 Carson Holt : >>> >>>> What does 'df -h dpp_contig.maker.output' show? >>>> >>>> --Carson >>>> >>>> From: Chris Bioinfo >>>> Date: Thursday, March 20, 2014 at 10:00 AM >>>> >>>> To: Carson Holt >>>> Subject: Re: [maker-devel] Annotation with maker2 >>>> >>>> sorry, mistake on the dir! >>>> >>>> I have these files: >>>> dpp_contig_datastore dpp_contig_master_datastore_index.log >>>> maker_bopts.log maker_exe.log maker_opts.log mpi_blastdb seen.dbm >>>> >>>> >>>> 2014-03-20 16:59 GMT+01:00 Chris Bioinfo : >>>>> no, >>>>> >>>>> I have theses files in the directory: >>>>> dpp_contig.fasta dpp_est.fasta hsap_contig.fasta >>>>> hsap_protein.fasta maker_exe.ctl >>>>> dpp_contig.maker.output dpp_protein.fasta hsap_est.fasta >>>>> maker_bopts.ctl maker_opts.ctl te_proteins.fasta >>>>> >>>>> >>>>> >>>>> 2014-03-20 16:53 GMT+01:00 Carson Holt : >>>>> >>>>>> Did >>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker. >>>>>> output/dpp_contig.db exist? >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> From: Chris Bioinfo >>>>>> Date: Thursday, March 20, 2014 at 9:50 AM >>>>>> >>>>>> To: Carson Holt >>>>>> Subject: Re: [maker-devel] Annotation with maker2 >>>>>> >>>>>> cdantec at belem:~$ /usr/bin/perl -v >>>>>> >>>>>> This is perl 5, version 18, subversion 1 (v5.18.1) built for >>>>>> x86_64-linux-gnu-thread-multi >>>>>> (with 46 registered patches, see perl -V for more detail) >>>>>> >>>>>> Copyright 1987-2013, Larry Wall >>>>>> >>>>>> Perl may be copied only under the terms of either the Artistic License or >>>>>> the >>>>>> GNU General Public License, which may be found in the Perl 5 source kit. >>>>>> >>>>>> Complete documentation for Perl, including FAQ lists, should be found on >>>>>> this system using "man perl" or "perldoc perl". If you have access to >>>>>> the >>>>>> Internet, point your browser at http://www.perl.org/, the Perl Home Page. >>>>>> >>>>>> >>>>>> >>>>>> 2014-03-20 16:32 GMT+01:00 Carson Holt : >>>>>>> What do you get for when you type --> /usr/bin/perl -v >>>>>>> >>>>>>> The key to the error is this line --> >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file >>>>>>> >>>>>>> Either the database doesn't exist, or is corrupt. Does it exist? >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> From: Chris Bioinfo >>>>>>> Date: Thursday, March 20, 2014 at 9:25 AM >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Annotation with maker2 >>>>>>> >>>>>>> Dear Carson, >>>>>>> >>>>>>> I have reinstalled DBD::SQLite module, check the permission in my >>>>>>> directory, configure the TMP value in maker_opts.ctl. perl is in >>>>>>> /usr/bin/perl. >>>>>>> I have deleted many times the output directory.. but same problem.. >>>>>>> >>>>>>> So here the debug output : >>>>>>> ****MODULE VERSION INFO >>>>>>> 0.05 Acme::Damn /usr/local/lib/perl/5.18.1/Acme/Damn.pm >>>>>>> 1.01 AnyDBM_File /usr/share/perl/5.18/AnyDBM_File.pm >>>>>>> 5.73 AutoLoader /usr/share/perl/5.18/AutoLoader.pm >>>>>>> UNKNOWN Bio::AnalysisParserI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnalysisParserI.pm >>>>>>> UNKNOWN Bio::AnnotatableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotatableI.pm >>>>>>> UNKNOWN Bio::Annotation::Collection >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/Collection.pm >>>>>>> UNKNOWN Bio::Annotation::SimpleValue >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/SimpleValue.pm >>>>>>> UNKNOWN Bio::Annotation::TypeManager >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/TypeManager.pm >>>>>>> UNKNOWN Bio::AnnotationCollectionI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationCollectionI.pm >>>>>>> UNKNOWN Bio::AnnotationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationI.pm >>>>>>> 1.006923 Bio::DB::Fasta >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/Fasta.pm >>>>>>> UNKNOWN Bio::DB::InMemoryCache >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/InMemoryCache.pm >>>>>>> UNKNOWN Bio::DB::IndexedBase >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/IndexedBase.pm >>>>>>> UNKNOWN Bio::DB::RandomAccessI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/RandomAccessI.pm >>>>>>> UNKNOWN Bio::DB::SeqI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/SeqI.pm >>>>>>> UNKNOWN Bio::DescribableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DescribableI.pm >>>>>>> UNKNOWN Bio::Event::EventGeneratorI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventGeneratorI.pm >>>>>>> UNKNOWN Bio::Event::EventHandlerI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventHandlerI.pm >>>>>>> UNKNOWN Bio::Factory::ObjectFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactory.pm >>>>>>> UNKNOWN Bio::Factory::ObjectFactoryI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactoryI.pm >>>>>>> UNKNOWN Bio::Factory::SequenceFactoryI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/SequenceFactoryI.pm >>>>>>> UNKNOWN Bio::FeatureHolderI >>>>>>> /usr/local/share/perl/5.18.1/Bio/FeatureHolderI.pm >>>>>>> UNKNOWN Bio::IdentifiableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/IdentifiableI.pm >>>>>>> UNKNOWN Bio::LocatableSeq >>>>>>> /usr/local/share/perl/5.18.1/Bio/LocatableSeq.pm >>>>>>> UNKNOWN Bio::Location::Atomic >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Atomic.pm >>>>>>> UNKNOWN Bio::Location::CoordinatePolicyI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/CoordinatePolicyI.pm >>>>>>> UNKNOWN Bio::Location::Fuzzy >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Fuzzy.pm >>>>>>> UNKNOWN Bio::Location::FuzzyLocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/FuzzyLocationI.pm >>>>>>> UNKNOWN Bio::Location::Simple >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Simple.pm >>>>>>> UNKNOWN Bio::Location::Split >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Split.pm >>>>>>> UNKNOWN Bio::Location::SplitLocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/SplitLocationI.pm >>>>>>> UNKNOWN Bio::Location::WidestCoordPolicy >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/WidestCoordPolicy.pm >>>>>>> UNKNOWN Bio::LocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/LocationI.pm >>>>>>> UNKNOWN Bio::PrimarySeq >>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeq.pm >>>>>>> 1.006923 Bio::PrimarySeqI >>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeqI.pm >>>>>>> UNKNOWN Bio::Range /usr/local/share/perl/5.18.1/Bio/Range.pm >>>>>>> UNKNOWN Bio::RangeI /usr/local/share/perl/5.18.1/Bio/RangeI.pm >>>>>>> 1.006923 Bio::Root::Exception >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Exception.pm >>>>>>> UNKNOWN Bio::Root::HTTPget >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/HTTPget.pm >>>>>>> UNKNOWN Bio::Root::IO >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/IO.pm >>>>>>> 1.006923 Bio::Root::Root >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Root.pm >>>>>>> 1.006923 Bio::Root::RootI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/RootI.pm >>>>>>> 1.006923 Bio::Root::Version >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Version.pm >>>>>>> UNKNOWN Bio::Search::HSP::GenericHSP >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/GenericHSP.pm >>>>>>> UNKNOWN Bio::Search::HSP::HSPFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPFactory.pm >>>>>>> UNKNOWN Bio::Search::HSP::HSPI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPI.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/Base.p>>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/august >>>>>>> us.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastn >>>>>>> .pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastx >>>>>>> .pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/cdna2g >>>>>>> enome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/est2ge >>>>>>> nome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/fgenes >>>>>>> h.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/genema >>>>>>> rk.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::gff3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/gff3.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/protei >>>>>>> n2genome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::repeatmasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/repeat >>>>>>> masker.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snap.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snosca >>>>>>> n.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/tblast >>>>>>> x.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/trnasc >>>>>>> an.pm >>>>>>> 1.006923 Bio::Search::Hit::GenericHit >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/GenericHit.pm >>>>>>> UNKNOWN Bio::Search::Hit::HitFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitFactory.pm >>>>>>> UNKNOWN Bio::Search::Hit::HitI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitI.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/Base.p>>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/august >>>>>>> us.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastn >>>>>>> .pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastx >>>>>>> .pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/cdna2g >>>>>>> enome.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/est2ge >>>>>>> nome.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/fgenes >>>>>>> h.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/genema >>>>>>> rk.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::gff3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/gff3.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/protei >>>>>>> n2genome.pm >>>>>>> 1.006923 Bio::Search::Hit::PhatHit::repeatmasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/repeat >>>>>>> masker.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snap.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snosca >>>>>>> n.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/tblast >>>>>>> x.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/trnasc >>>>>>> an.pm >>>>>>> 1.006923 Bio::Search::SearchUtils >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/SearchUtils.pm >>>>>>> UNKNOWN Bio::SearchIO >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO.pm >>>>>>> UNKNOWN Bio::SearchIO::EventHandlerI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/EventHandlerI.pm >>>>>>> UNKNOWN Bio::SearchIO::SearchResultEventBuilder >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/SearchResultEventBuilder.pm >>>>>>> UNKNOWN Bio::Seq /usr/local/share/perl/5.18.1/Bio/Seq.pm >>>>>>> UNKNOWN Bio::Seq::SeqFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Seq/SeqFactory.pm >>>>>>> UNKNOWN Bio::SeqAnalysisParserI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqAnalysisParserI.pm >>>>>>> UNKNOWN Bio::SeqFeature::FeaturePair >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/FeaturePair.pm >>>>>>> UNKNOWN Bio::SeqFeature::Generic >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Generic.pm >>>>>>> UNKNOWN Bio::SeqFeature::Similarity >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Similarity.pm >>>>>>> UNKNOWN Bio::SeqFeature::SimilarityPair >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/SimilarityPair.pm >>>>>>> UNKNOWN Bio::SeqFeatureI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeatureI.pm >>>>>>> UNKNOWN Bio::SeqI /usr/local/share/perl/5.18.1/Bio/SeqI.pm >>>>>>> UNKNOWN Bio::SeqUtils >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqUtils.pm >>>>>>> 1.006923 Bio::Tools::CodonTable >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/CodonTable.pm >>>>>>> UNKNOWN Bio::Tools::GFF >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/GFF.pm >>>>>>> 1.006923 Bio::Tools::IUPAC >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/IUPAC.pm >>>>>>> 7.3 Bit::Vector /usr/local/lib/perl/5.18.1/Bit/Vector.pm >>>>>>> 0.01 CGL::Annotation >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation.pm >>>>>>> 0.01 CGL::Annotation::Feature >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature.pm >>>>>>> 0.01 CGL::Annotation::Feature::Contig >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Contig >>>>>>> .pm >>>>>>> 0.01 CGL::Annotation::Feature::Exon >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Exon.p>>>>>>> m >>>>>>> 0.01 CGL::Annotation::Feature::Gene >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Gene.p>>>>>>> m >>>>>>> 0.01 CGL::Annotation::Feature::Intron >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Intron >>>>>>> .pm >>>>>>> 0.01 CGL::Annotation::Feature::Protein >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Protei >>>>>>> n.pm >>>>>>> 0.01 CGL::Annotation::Feature::Sequence_variant >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Sequen >>>>>>> ce_variant.pm >>>>>>> 0.01 CGL::Annotation::Feature::Transcript >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Transc >>>>>>> ript.pm >>>>>>> 0.01 CGL::Annotation::FeatureLocation >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureLocatio >>>>>>> n.pm >>>>>>> 0.01 CGL::Annotation::FeatureRelationship >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureRelatio >>>>>>> nship.pm >>>>>>> 0.01 CGL::Annotation::Iterator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Iterator.pm >>>>>>> 0.01 CGL::Annotation::Trace >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Trace.pm >>>>>>> 0.01 CGL::Clone >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Clone.pm >>>>>>> 0.01 CGL::Ontology::Node >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Node.pm >>>>>>> 0.01 CGL::Ontology::NodeRelationship >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/NodeRelationship >>>>>>> .pm >>>>>>> 0.01 CGL::Ontology::Ontology >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Ontology.pm >>>>>>> 0.01 CGL::Ontology::Parser::OBO >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Parser/OBO.pm >>>>>>> 0.01 CGL::Ontology::SO >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/SO.pm >>>>>>> 0.01 CGL::Ontology::Trace >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Trace.pm >>>>>>> 0.01 CGL::Revcomp >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Revcomp.pm >>>>>>> 0.01 CGL::TranslationMachine >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/TranslationMachine.pm >>>>>>> 1.32 Carp /usr/local/share/perl/5.18.1/Carp.pm >>>>>>> 1.32 Carp::Heavy /usr/local/share/perl/5.18.1/Carp/Heavy.pm >>>>>>> 0.64 Class::Struct /usr/share/perl/5.18/Class/Struct.pm >>>>>>> 0.36 Clone /usr/local/lib/perl/5.18.1/Clone.pm >>>>>>> 5.018001 Config /usr/lib/perl/5.18/Config.pm >>>>>>> 3.40 Cwd /usr/lib/perl/5.18/Cwd.pm >>>>>>> 1.42 DBD::SQLite /usr/local/lib/perl/5.18.1/DBD/SQLite.pm >>>>>>> 1.631 DBI /usr/local/lib/perl/5.18.1/DBI.pm >>>>>>> 1.827 DB_File /usr/lib/perl/5.18/DB_File.pm >>>>>>> 2.145 Data::Dumper /usr/lib/perl/5.18/Data/Dumper.pm >>>>>>> 0.11 Datastore::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/Base.pm >>>>>>> 0.01 Datastore::MD5 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/MD5.pm >>>>>>> 2.53 Digest::MD5 /usr/local/lib/perl/5.18.1/Digest/MD5.pm >>>>>>> 1.16 Digest::base /usr/share/perl/5.18/Digest/base.pm >>>>>>> >>>>>>> UNKNOWN Dumper::GFF::GFFV3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/GFF/GFFV3.pm >>>>>>> UNKNOWN Dumper::XML::Game >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game.pm >>>>>>> UNKNOWN Dumper::XML::Game_Xml >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game_Xml.pm >>>>>>> 1.18 DynaLoader /usr/lib/perl/5.18/DynaLoader.pm >>>>>>> 1.18 Errno /usr/lib/perl/5.18/Errno.pm >>>>>>> 0.17015 Error >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm >>>>>>> UNKNOWN Error::Simple >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error/Simple.pm >>>>>>> 5.68 Exporter /usr/share/perl/5.18/Exporter.pm >>>>>>> 5.68 Exporter::Heavy /usr/share/perl/5.18/Exporter/Heavy.pm >>>>>>> UNKNOWN Fasta >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Fasta.pm >>>>>>> UNKNOWN FastaChunk >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunk.pm >>>>>>> UNKNOWN FastaChunker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunker.pm >>>>>>> UNKNOWN FastaDB >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaDB.pm >>>>>>> UNKNOWN FastaFile >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaFile.pm >>>>>>> UNKNOWN FastaSeq >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaSeq.pm >>>>>>> 1.11 Fcntl /usr/lib/perl/5.18/Fcntl.pm >>>>>>> 2.84 File::Basename /usr/share/perl/5.18/File/Basename.pm >>>>>>> 2.26 File::Copy /usr/share/perl/5.18/File/Copy.pm >>>>>>> 1.20 File::Glob /usr/lib/perl/5.18/File/Glob.pm >>>>>>> 1.20 File::NFSLock >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/NFSLock.pm >>>>>>> 2.09 File::Path /usr/share/perl/5.18/File/Path.pm >>>>>>> 3.40 File::Spec /usr/lib/perl/5.18/File/Spec.pm >>>>>>> 3.40 File::Spec::Unix /usr/lib/perl/5.18/File/Spec/Unix.pm >>>>>>> 0.2304 File::Temp /usr/local/share/perl/5.18.1/File/Temp.pm >>>>>>> 1.09 File::Which >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/Which.pm >>>>>>> 2.02 FileHandle /usr/share/perl/5.18/FileHandle.pm >>>>>>> 1.51 FindBin /usr/share/perl/5.18/FindBin.pm >>>>>>> UNKNOWN GFFDB >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> UNKNOWN GI /usr/local/annotation/maker2.31/bin/../lib/GI.pm >>>>>>> 2.42 Getopt::Long /usr/local/share/perl/5.18.1/Getopt/Long.pm >>>>>>> 6.02 HTTP::Date /usr/share/perl5/HTTP/Date.pm >>>>>>> 6.05 HTTP::Headers /usr/share/perl5/HTTP/Headers.pm >>>>>>> 6.06 HTTP::Message /usr/share/perl5/HTTP/Message.pm >>>>>>> 6.00 HTTP::Request /usr/share/perl5/HTTP/Request.pm >>>>>>> 6.04 HTTP::Response /usr/share/perl5/HTTP/Response.pm >>>>>>> 6.03 HTTP::Status /usr/share/perl5/HTTP/Status.pm >>>>>>> 1.28 IO /usr/lib/perl/5.18/IO.pm >>>>>>> 1.16 IO::File /usr/lib/perl/5.18/IO/File.pm >>>>>>> 1.34 IO::Handle /usr/lib/perl/5.18/IO/Handle.pm >>>>>>> 1.1 IO::Seekable /usr/lib/perl/5.18/IO/Seekable.pm >>>>>>> 1.21 IO::Select /usr/lib/perl/5.18/IO/Select.pm >>>>>>> 1.36 IO::Socket /usr/lib/perl/5.18/IO/Socket.pm >>>>>>> 1.33 IO::Socket::INET /usr/lib/perl/5.18/IO/Socket/INET.pm >>>>>>> 1.24 IO::Socket::UNIX /usr/lib/perl/5.18/IO/Socket/UNIX.pm >>>>>>> 1.13 IPC::Open3 /usr/share/perl/5.18/IPC/Open3.pm >>>>>>> 0.53 Inline /usr/local/share/perl/5.18.1/Inline.pm >>>>>>> UNKNOWN Inline::denter >>>>>>> /usr/local/share/perl/5.18.1/Inline/denter.pm >>>>>>> UNKNOWN Iterator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator.pm >>>>>>> UNKNOWN Iterator::Any >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Any.pm >>>>>>> UNKNOWN Iterator::Fasta >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Fasta.pm >>>>>>> UNKNOWN Iterator::GFF3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/GFF3.pm >>>>>>> 6.05 LWP /usr/share/perl5/LWP.pm >>>>>>> UNKNOWN LWP::MemberMixin /usr/share/perl5/LWP/MemberMixin.pm >>>>>>> 6.00 LWP::Protocol /usr/share/perl5/LWP/Protocol.pm >>>>>>> 6.05 LWP::UserAgent /usr/share/perl5/LWP/UserAgent.pm >>>>>>> 0.33 List::MoreUtils >>>>>>> /usr/local/lib/perl/5.18.1/List/MoreUtils.pm >>>>>>> 1.38 List::Util /usr/local/lib/perl/5.18.1/List/Util.pm >>>>>>> UNKNOWN MAKER::ConfigData >>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/MAKER/ConfigData.pm >>>>>>> 1.32 POSIX /usr/lib/perl/5.18/POSIX.pm >>>>>>> 0.01 Parallel::Application::MPI >>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/Parallel/Application/MPI >>>>>>> .pm >>>>>>> 0.02 Perl::Unsafe::Signals >>>>>>> /usr/local/lib/perl/5.18.1/Perl/Unsafe/Signals.pm >>>>>>> UNKNOWN PhatHit_utils >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PhatHit_utils.pm >>>>>>> UNKNOWN PostData >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PostData.pm >>>>>>> 1.0 Proc::ProcessTable_simple >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/ProcessTable_simple.pm >>>>>>> 1.0 Proc::Signal >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/Signal.pm >>>>>>> UNKNOWN Process::MpiChunk >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm >>>>>>> UNKNOWN Process::MpiTiers >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiTiers.pm >>>>>>> 1.38 Scalar::Util /usr/local/lib/perl/5.18.1/Scalar/Util.pm >>>>>>> 1.02 SelectSaver /usr/share/perl/5.18/SelectSaver.pm >>>>>>> UNKNOWN Shadower >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Shadower.pm >>>>>>> UNKNOWN SimpleCluster >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SimpleCluster.pm >>>>>>> 2.009 Socket /usr/lib/perl/5.18/Socket.pm >>>>>>> UNKNOWN SpaceBase >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SpaceBase.pm >>>>>>> 2.45 Storable /usr/local/lib/perl/5.18.1/Storable.pm >>>>>>> 1.07 Symbol /usr/share/perl/5.18/Symbol.pm >>>>>>> 1.17 Sys::Hostname /usr/lib/perl/5.18/Sys/Hostname.pm >>>>>>> 0.21 Sys::SigAction >>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction.pm >>>>>>> UNKNOWN Sys::SigAction::Alarm >>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction/Alarm.pm >>>>>>> 4.02 Term::ANSIColor /usr/share/perl/5.18/Term/ANSIColor.pm >>>>>>> 4.2 Tie::Handle /usr/share/perl/5.18/Tie/Handle.pm >>>>>>> 1.04 Tie::Hash /usr/share/perl/5.18/Tie/Hash.pm >>>>>>> 4.3 Tie::StdHandle /usr/share/perl/5.18/Tie/StdHandle.pm >>>>>>> 1.9726 Time::HiRes /usr/local/lib/perl/5.18.1/Time/HiRes.pm >>>>>>> 1.2300 Time::Local /usr/share/perl/5.18/Time/Local.pm >>>>>>> 1.60 URI /usr/share/perl5/URI.pm >>>>>>> 3.31 URI::Escape /usr/share/perl5/URI/Escape.pm >>>>>>> UNKNOWN Widget >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget.pm >>>>>>> UNKNOWN Widget::RepeatMasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/RepeatMasker.pm >>>>>>> UNKNOWN Widget::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/augustus.pm >>>>>>> >>>>>>> UNKNOWN Widget::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastn.pm >>>>>>> >>>>>>> UNKNOWN Widget::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastx.pm >>>>>>> >>>>>>> UNKNOWN Widget::exonerate >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate.pm >>>>>>> >>>>>>> UNKNOWN Widget::exonerate::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/cdna2genome. >>>>>>> pm >>>>>>> UNKNOWN Widget::exonerate::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/est2genome.p >>>>>>> m >>>>>>> UNKNOWN Widget::exonerate::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/protein2geno >>>>>>> me.pm >>>>>>> UNKNOWN Widget::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/fgenesh.pm >>>>>>> >>>>>>> UNKNOWN Widget::formater >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/formater.pm >>>>>>> >>>>>>> UNKNOWN Widget::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/genemark.pm >>>>>>> >>>>>>> UNKNOWN Widget::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snap.pm >>>>>>> >>>>>>> UNKNOWN Widget::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snoscan.pm >>>>>>> >>>>>>> UNKNOWN Widget::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/tblastx.pm >>>>>>> >>>>>>> UNKNOWN Widget::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/trnascan.pm >>>>>>> >>>>>>> 0.16 XSLoader /usr/share/perl/5.18/XSLoader.pm >>>>>>> 0.21 attributes /usr/lib/perl/5.18/attributes.pm >>>>>>> >>>>>>> 2.18 base /usr/share/perl/5.18/base.pm >>>>>>> 1.04 bytes /usr/share/perl/5.18/bytes.pm >>>>>>> UNKNOWN clean >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/clean.pm >>>>>>> UNKNOWN cluster >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/cluster.pm >>>>>>> >>>>>>> UNKNOWN compare >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/compare.pm >>>>>>> >>>>>>> 1.27 constant /usr/share/perl/5.18/constant.pm >>>>>>> >>>>>>> UNKNOWN ds_utility >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/ds_utility.pm >>>>>>> >>>>>>> UNKNOWN exonerate::splice_info >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/exonerate/splice_info.pm >>>>>>> >>>>>>> 0.34 forks /usr/local/lib/perl/5.18.1/forks.pm >>>>>>> >>>>>>> 2.08001 forks::Devel::Symdump >>>>>>> /usr/local/lib/perl/5.18.1/forks/Devel/Symdump.pm >>>>>>> 0.34 forks::shared /usr/local/lib/perl/5.18.1/forks/shared.pm >>>>>>> >>>>>>> 0.34 forks::signals >>>>>>> /usr/local/lib/perl/5.18.1/forks/signals.pm >>>>>>> 1.00 integer /usr/share/perl/5.18/integer.pm >>>>>>> >>>>>>> 0.63 lib /usr/lib/perl/5.18/lib.pm >>>>>>> 1.02 locale /usr/share/perl/5.18/locale.pm >>>>>>> UNKNOWN maker::auto_annotator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/auto_annotator.pm >>>>>>> >>>>>>> UNKNOWN maker::join >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/join.pm >>>>>>> >>>>>>> UNKNOWN maker::quality_index >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/quality_index.pm >>>>>>> >>>>>>> UNKNOWN maker::sens_spec >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/sens_spec.pm >>>>>>> >>>>>>> 1.22 overload /usr/share/perl/5.18/overload.pm >>>>>>> >>>>>>> 0.02 overloading /usr/share/perl/5.18/overloading.pm >>>>>>> >>>>>>> 0.225 parent /usr/share/perl/5.18/parent.pm >>>>>>> UNKNOWN polisher >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::altest >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/altest.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::est >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/est.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::protein >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/protein.pm >>>>>>> >>>>>>> UNKNOWN repeat_mask_seq >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/repeat_mask_seq.pm >>>>>>> >>>>>>> 0.1 runlog >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/runlog.pm >>>>>>> UNKNOWN shadow_AED >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/shadow_AED.pm >>>>>>> 1.07 sigtrap /usr/share/perl/5.18/sigtrap.pm >>>>>>> >>>>>>> 1.07 strict /usr/share/perl/5.18/strict.pm >>>>>>> 1.77 threads /usr/local/lib/perl/5.18.1/forks.pm >>>>>>> >>>>>>> 1.33 threads::shared >>>>>>> /usr/local/lib/perl/5.18.1/forks/shared.pm >>>>>>> 1.03 vars /usr/share/perl/5.18/vars.pm >>>>>>> 1.18 warnings /usr/share/perl/5.18/warnings.pm >>>>>>> >>>>>>> 1.02 warnings::register >>>>>>> /usr/share/perl/5.18/warnings/register.pm >>>>>>> STATUS: Parsing control files... >>>>>>> Calling GI::load_control_files at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 452. >>>>>>> Calling GI::new_instance_temp at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 463. >>>>>>> Calling GI::mount_check at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 465. >>>>>>> Calling GI::set_global_temp at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 483. >>>>>>> STATUS: Processing and indexing input FASTA files... >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling List::Util::shuffle at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 529. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::create_blastdb at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 574. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 622. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 623. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> STATUS: Setting up database for any GFF3 input... >>>>>>> Calling GFFDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 629. >>>>>>> Calling GFFDB::next_build at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 631. >>>>>>> Calling ds_utility::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 635. >>>>>>> A data structure will be created for you at: >>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker >>>>>>> .output/dpp_contig_datastore >>>>>>> >>>>>>> To access files for individual sequences use the datastore index: >>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker >>>>>>> .output/dpp_contig_master_datastore_index.log >>>>>>> >>>>>>> Calling Datastore::MD5::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 636. >>>>>>> Calling Iterator::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 639. >>>>>>> Calling Iterator::Fasta::skip_file at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 641. >>>>>>> Calling Iterator::Fasta::step at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 643. >>>>>>> STATUS: Now running MAKER... >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now starting the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Calling FastaDB::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> line 107. >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108. >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> Processing run.log file... >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now retrying the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> Tries: 2!! >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Calling FastaDB::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> line 107. >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108. >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> Processing run.log file... >>>>>>> >>>>>>> >>>>>>> Maker is now finished!!! >>>>>>> >>>>>>> Many thanks for you help >>>>>>> >>>>>>> Christelle >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-03-19 14:01 GMT+01:00 Carson Holt : >>>>>>> Your problem is one of the following. You need to reinstall the >>>>>>> DBD::SQLite module, you are running in a directory you don?t have >>>>>>> permissions for, you set your TMDIR environmental variable or TMP value >>>>>>> in maker_opts.ctl to an NFS mounted or memory mounted directory, or you >>>>>>> are using a self compiled version of Perl (I.e. not /usr/bin/perl) that >>>>>>> has issues (probably with DB or SQLite modules). You can also >>>>>>> completely delete the output directory, and start again to see if it was >>>>>>> just a random error. You should look at each of those first. You can >>>>>>> also run MAKER with the --debug command line flag and send it to me if >>>>>>> all of those seem not to be the issue. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> From: Chris Bioinfo >>>>>>> Date: Wednesday, March 19, 2014 at 5:09 AM >>>>>>> To: >>>>>>> Subject: [maker-devel] Annotation with maker2 >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I'm installing/using maker2 for the first time and I have an error by >>>>>>> using it. >>>>>>> >>>>>>> I certainly missing something, but I don't know what. >>>>>>> >>>>>>> I compile maker with no error message and I have all these directories >>>>>>> after compilation: >>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README src >>>>>>> >>>>>>> Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I >>>>>>> have this error: >>>>>>> >>>>>>> STATUS: Now running MAKER... >>>>>>> examining contents of the fasta file and run log >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now starting the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) >>>>>>> failed: unable to open database file at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> ... >>>>>>> >>>>>>> ideas? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Christelle >>>>>>> >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 21 10:43:59 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 21 Mar 2014 08:43:59 -0700 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: Hi, I just wanted to say thanks for all your help- I did the reciprocal best blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to associate names between strain assemblies/annotations. Worked perfectly! -Janna On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt wrote: > maker_map_ids does a translation (i.e. change gene-A to smug1), so you > need to know which genes you want to translate names to (two column input > file, column 1 -> original ID, column 2 -> new ID). I'm not sure EST > forward is the best way to do this, although I do think maker_map_ids is > the tool to use in the end. The question is how to make a list of IDs to > translate as the input to maker_map_ids? > > I would actually just use BLASTP against the reference strain, and then > do reciprocal best BLAST hits. To do this you BLAST your reference > proteins against your maker proteins. Then do the opposite, BLAST your > maker proteins against your reference proteins. If they are both each > others best hit, then they are orthologous, and you can safely make a two > column entry for the maker_map_ids input (i.e. maker-gene-1 translates into > smug1). > > --Carson > > > From: Daniel Ence > Date: Friday, March 14, 2014 at 11:32 AM > To: Janna Fierst , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] associating gene names between related strains > > Hi Janna, So do you have one strain that you want to use as the reference > for all the others? There's a script that comes with MAKER called > maker_map_ids that lets you use a common prefix or suffix for entries in a > fasta file from one strain and then use est_forward to use that ID in the > gene models for the other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > Janna Fierst [jfierst at uoregon.edu] > *Sent:* Friday, March 14, 2014 10:06 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate > the gene naming so that orthologs between species can be associated by > name. I have been playing around a little with the est_forward option but > can't figure out a good system/workflow that preserves names but still uses > the strain-specific RNA-Seq EST set for the actual gene models. Thanks! > -Janna > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 21 10:54:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 09:54:15 -0600 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: I'm glad we could help. --Carson From: Janna Fierst Date: Friday, March 21, 2014 at 9:43 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] associating gene names between related strains Hi, I just wanted to say thanks for all your help- I did the reciprocal best blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to associate names between strain assemblies/annotations. Worked perfectly! -Janna On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt wrote: > maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to > know which genes you want to translate names to (two column input file, column > 1 -> original ID, column 2 -> new ID). I?m not sure EST forward is the best > way to do this, although I do think maker_map_ids is the tool to use in the > end. The question is how to make a list of IDs to translate as the input to > maker_map_ids? > > I would actually just use BLASTP against the reference strain, and then do > reciprocal best BLAST hits. To do this you BLAST your reference proteins > against your maker proteins. Then do the opposite, BLAST your maker proteins > against your reference proteins. If they are both each others best hit, then > they are orthologous, and you can safely make a two column entry for the > maker_map_ids input (i.e. maker-gene-1 translates into smug1). > > ?Carson > > > From: Daniel Ence > Date: Friday, March 14, 2014 at 11:32 AM > To: Janna Fierst , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] associating gene names between related strains > > Hi Janna, So do you have one strain that you want to use as the reference for > all the others? There's a script that comes with MAKER called maker_map_ids > that lets you use a common prefix or suffix for entries in a fasta file from > one strain and then use est_forward to use that ID in the gene models for the > other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna > Fierst [jfierst at uoregon.edu] > Sent: Friday, March 14, 2014 10:06 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate the > gene naming so that orthologs between species can be associated by name. I > have been playing around a little with the est_forward option but can't figure > out a good system/workflow that preserves names but still uses the > strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hossein.Borhan at AGR.GC.CA Fri Mar 21 11:41:38 2014 From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein) Date: Fri, 21 Mar 2014 16:41:38 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: Dear Carson I ran maker and modified .pm files and it resolved the problem with the fasta output. Thanks a lot for your help. HB On 14-03-17 1:45 PM, "Carson Holt" wrote: >I have attached 4 files for you to place in the .../maker/Widgets/ >directory. > >The *blast.pm files will suppress the BLAST+ failures you are getting >(alternatively you can just downgrade to BLAST 2.27 to get the same >effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. >In the latest release NCBI redid all their warnings and error codes so it >spits out a lot of garbage and fails with different messages than it did >before. For example BLAST now warns you every time it encounter a fasta >header with a comment (virtually every fasta entry in existence falls in >this category), so your screen will be awash with meaningless warning >messages. > >The fgenesh.pm file will fix the other failure, which only occurs if you >use fgenesh simultaneously with the est_fustion=1 option. No other >predictors are affected. > >Thanks, >Carson > > >On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: > >>Dear Carson >> >>Sorry for the late reply. I was away for a couple of days. I have >>uploaded >>the out put files plus control and error output on the FTP site that you >>provided >>The user ID is borhanh >> >>I used blast+ for this run. >> >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >>On 14-03-13 10:00 AM, "Carson Holt" >>wrote: >> >>>Just resending this to the correct maker-devel address. Please when >>>replying, do not CC the incorrect maker-devel-bounce address. >>> >>>Thanks, >>>Carson >>> >>> >>>On 3/13/14, 9:56 AM, "Carson Holt" >>>wrote: >>> >>>>FGENESH is not a heavily used tool, so depending on which version it is >>>>(either too old or too new), output might be slightly different which >>>>could cause incorrect parsing. Could you tar up your maker.output >>>>folder, >>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>(send me either your user/guest ID after you upload). >>>> >>>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>>is >>>>the old legacy version of NCBI BLAST. You can do this by setting the >>>>blast type in maker_bopts.ctl and the location of executables in >>>>maker_exe.ctl. >>>> >>>>Thanks, >>>>Carson >>>> >>>> >>>> >>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>>wrote: >>>> >>>>>Dear Maker users >>>>> >>>>> >>>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>>the >>>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>>inserted in the nucleotide sequence of some of the genes. This seems >>>>>to >>>>>be related to transcripts predicted by fgenesh_masked. >>>>> >>>>> >>>>>Here is an example for one of the genes >>>>> >>>>> >>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>>offset:0 AE >>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>>TTTCGACAAGC >>>>> >>>>>The same genome sequence was used for the first round of maker (2.10) >>>>>without such problem. I checked the sequence for the scaffold related >>>>>to >>>>>one of the affected transcripts and there was no error in the >>>>>sequence. >>>>>I am not sure what is causing this. The only error that I could spot >>>>>in >>>>>the output error file is the following >>>>> >>>>> >>>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>>contexts/frames of query sequences. >>>>> >>>>> >>>>> >>>>>Your help is appreciated >>>>> >>>>> >>>>> >>>>>HB >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> > From carsonhh at gmail.com Fri Mar 21 11:43:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 10:43:10 -0600 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts Message-ID: Thanks for letting me know. --Carson On 3/21/14, 10:41 AM, "Borhan, Hossein" wrote: >Dear Carson > >I ran maker and modified .pm files and it resolved the problem with the >fasta output. Thanks a lot for your help. > > > > >HB > > > > > > > > >On 14-03-17 1:45 PM, "Carson Holt" wrote: > >>I have attached 4 files for you to place in the .../maker/Widgets/ >>directory. >> >>The *blast.pm files will suppress the BLAST+ failures you are getting >>(alternatively you can just downgrade to BLAST 2.27 to get the same >>effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. >>In the latest release NCBI redid all their warnings and error codes so it >>spits out a lot of garbage and fails with different messages than it did >>before. For example BLAST now warns you every time it encounter a fasta >>header with a comment (virtually every fasta entry in existence falls in >>this category), so your screen will be awash with meaningless warning >>messages. >> >>The fgenesh.pm file will fix the other failure, which only occurs if you >>use fgenesh simultaneously with the est_fustion=1 option. No other >>predictors are affected. >> >>Thanks, >>Carson >> >> >>On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: >> >>>Dear Carson >>> >>>Sorry for the late reply. I was away for a couple of days. I have >>>uploaded >>>the out put files plus control and error output on the FTP site that you >>>provided >>>The user ID is borhanh >>> >>>I used blast+ for this run. >>> >>> >>> >>> >>>Regards >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-03-13 10:00 AM, "Carson Holt" >>>wrote: >>> >>>>Just resending this to the correct maker-devel address. Please when >>>>replying, do not CC the incorrect maker-devel-bounce address. >>>> >>>>Thanks, >>>>Carson >>>> >>>> >>>>On 3/13/14, 9:56 AM, "Carson Holt" >>>>wrote: >>>> >>>>>FGENESH is not a heavily used tool, so depending on which version it >>>>>is >>>>>(either too old or too new), output might be slightly different which >>>>>could cause incorrect parsing. Could you tar up your maker.output >>>>>folder, >>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>>(send me either your user/guest ID after you upload). >>>>> >>>>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>>>is >>>>>the old legacy version of NCBI BLAST. You can do this by setting the >>>>>blast type in maker_bopts.ctl and the location of executables in >>>>>maker_exe.ctl. >>>>> >>>>>Thanks, >>>>>Carson >>>>> >>>>> >>>>> >>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>>>wrote: >>>>> >>>>>>Dear Maker users >>>>>> >>>>>> >>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>>>the >>>>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>>>inserted in the nucleotide sequence of some of the genes. This seems >>>>>>to >>>>>>be related to transcripts predicted by fgenesh_masked. >>>>>> >>>>>> >>>>>>Here is an example for one of the genes >>>>>> >>>>>> >>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>>>offset:0 AE >>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>>>TTTCGACAAGC >>>>>> >>>>>>The same genome sequence was used for the first round of maker (2.10) >>>>>>without such problem. I checked the sequence for the scaffold related >>>>>>to >>>>>>one of the affected transcripts and there was no error in the >>>>>>sequence. >>>>>>I am not sure what is causing this. The only error that I could spot >>>>>>in >>>>>>the output error file is the following >>>>>> >>>>>> >>>>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>>>contexts/frames of query sequences. >>>>>> >>>>>> >>>>>> >>>>>>Your help is appreciated >>>>>> >>>>>> >>>>>> >>>>>>HB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Mon Mar 24 05:08:25 2014 From: marc.hoeppner at imbim.uu.se (=?iso-8859-1?Q?Marc_H=F6ppner?=) Date: Mon, 24 Mar 2014 10:08:25 +0000 Subject: [maker-devel] Annotations from proteins, follow-up Message-ID: <10AFC7D0-82BA-4527-9B77-80DC4BE80CFD@imbim.uu.se> Hi, I had previously inquired about protein-based gene building (for example to create a training set for SNAP). This is currently possible with Maker (2.31), but I noticed a limitation. Specifically, I tend to run Maker once to generate all the raw computes (protein and set alignments, mostly). I then separate these out into GFF files that I can store away and use in various combinations of settings and data in parallel. However, the protein2genome option does not seem to work off pre-aligned protein data (e.g. protein2genome.gff produced with Maker). Is that intentional and is there a work-around? Or is the only option to run this with fasta files? Cheers, Marc Marc P. Hoeppner, PhD Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From sujaikumar at gmail.com Mon Mar 24 09:15:16 2014 From: sujaikumar at gmail.com (Sujai) Date: Mon, 24 Mar 2014 14:15:16 +0000 Subject: [maker-devel] Dashes in transcript predictions Message-ID: Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nGt.0.3.035610.maker.output.tgz Type: application/x-gzip Size: 45641 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 24 11:49:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 10:49:46 -0600 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: I've actually never seen that before, but looking through your output it appears to be specifically caused by setting correct_est_fusion=1, and how it interacts with some features of your dataset. I've attached a patch in the form of a file you can use to replace .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER download. Thanks, Carson From: Sujai Date: Monday, March 24, 2014 at 8:15 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Dashes in transcript predictions Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN KKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: join.pm Type: text/x-perl-script Size: 18644 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 24 12:05:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 11:05:15 -0600 Subject: [maker-devel] Annotations from proteins, follow-up Message-ID: It not so much intentional as it is a a limitation of the information in GFF3 format alignments. Right now protein2genome for Eukaryotes will only try and make exonerate derived alignments work because they have been polished around splice sites and MAKER still has access to the original protein sequence and alignment cigar string fro additional filtering, etc. With GFF3 pass-through the algorithm doesn't know nearly as much about what is passed in. For example the protein sequence is gone, cigar alignment strings are rarely included (Gap= attribute in GFF3), and it's not always clear if the alignment was polished for splice sites. Also since protein2genome=1 is expected to be used only to generate an initial training set, and not for final annotations, this is considered a reasonable restriction. If you still really want to force protein alignments from a GFF3 to be considered as potential models, you could put them in as pred_gff. In which case they will always be considered as potential models. Of course it will be relatively ugly because you lack things I mentioned before such as the alignment cigar string and original protein sequence that are normally used to filter protein2genome results for inclusion as models. --Carson On 3/24/14, 4:08 AM, "Marc H?ppner" wrote: >Hi, > >I had previously inquired about protein-based gene building (for example >to create a training set for SNAP). This is currently possible with Maker >(2.31), but I noticed a limitation. Specifically, I tend to run Maker >once to generate all the raw computes (protein and set alignments, >mostly). I then separate these out into GFF files that I can store away >and use in various combinations of settings and data in parallel. > >However, the protein2genome option does not seem to work off pre-aligned >protein data (e.g. protein2genome.gff produced with Maker). Is that >intentional and is there a work-around? Or is the only option to run this >with fasta files? > >Cheers, > >Marc > > >Marc P. Hoeppner, PhD > >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at imbim.uu.se > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 24 13:15:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 12:15:39 -0600 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: One more note on this. The sequence is actually fully correct if you just remove the '-' characters. So if you don't want to rerun MAKER with the patch, then you can use the attached script to just repair the transcript file by removing the '-' characters. Your GFF3 files and proteins files should already be correct as is. Usage --> perl fix_dash transcript_file.fasta > new_file.fasta You may need to place the script in the .../maker/bin/ directory so it can detect BioPerl if you don't have BioPerl installed system wide. Thanks, Carson From: Carson Holt Date: Monday, March 24, 2014 at 10:49 AM To: Sujai , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Dashes in transcript predictions I've actually never seen that before, but looking through your output it appears to be specifically caused by setting correct_est_fusion=1, and how it interacts with some features of your dataset. I've attached a patch in the form of a file you can use to replace .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER download. Thanks, Carson From: Sujai Date: Monday, March 24, 2014 at 8:15 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Dashes in transcript predictions Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN KKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sujaikumar at gmail.com Mon Mar 24 13:17:02 2014 From: sujaikumar at gmail.com (Sujai) Date: Mon, 24 Mar 2014 18:17:02 +0000 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: Wow. That was a super quick response. Thanks very much for confirming the problem and the fixes! On 24 March 2014 18:15, Carson Holt wrote: > One more note on this. The sequence is actually fully correct if you just > remove the '-' characters. So if you don't want to rerun MAKER with the > patch, then you can use the attached script to just repair the transcript > file by removing the '-' characters. Your GFF3 files and proteins files > should already be correct as is. > > Usage --> perl fix_dash transcript_file.fasta > new_file.fasta > > You may need to place the script in the .../maker/bin/ directory so it can > detect BioPerl if you don't have BioPerl installed system wide. > > Thanks, > Carson > > From: Carson Holt > Date: Monday, March 24, 2014 at 10:49 AM > To: Sujai , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] Dashes in transcript predictions > > I've actually never seen that before, but looking through your output it > appears to be specifically caused by setting correct_est_fusion=1, and how > it interacts with some features of your dataset. > > I've attached a patch in the form of a file you can use to replace > .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER > download. > > Thanks, > Carson > > > From: Sujai > Date: Monday, March 24, 2014 at 8:15 AM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Dashes in transcript predictions > > Dear Maker Team > > On a recent run with maker 2.31, I noticed that a couple of the > transcripts had dashes/hyphens in them. > > Example: > >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript > offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 > TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG > AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT > GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG > GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT > AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG > ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT > AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT > TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA > > The protein prediction for this transcript is ok: > > >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 > eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 > > MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY > > DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV > > Is this a known bug? I tried searching for "dash|hyphen" in the email list > but couldn't find anything else. > > Best wishes, > > - Sujai > > ps. I pulled out just this one contig and ran maker on it. all the > .maker.output files are attached. > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana.garnica at anu.edu.au Mon Mar 24 18:11:01 2014 From: diana.garnica at anu.edu.au (Diana Garnica Moreno) Date: Mon, 24 Mar 2014 23:11:01 +0000 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Message-ID: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> Hi there, We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? Thanks! Diana -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 24 18:25:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 17:25:09 -0600 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Message-ID: You are probably getting the wrong proteins from your scripts because you are not taking into account the 5' and 3' UTR in the transcript. For example >snap_masked-contig-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|22|240 The 5' UTR is 261bp and the 3' UTR is 22bp long. Both would have to be trimmed before translating the transcript into a protein. Once they are trimmed you can use frame 0 for the translation. The fasta_tool that comes with MAKER can be used to quickly trim the UTR. Example: fasta_tool maker_transcripts.fasta --trim_maker_utr Then you can try your other scripts again. Thanks, Carson From: Diana Garnica Moreno Date: Monday, March 24, 2014 at 5:11 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Hi there, We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? Thanks! Diana _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Tue Mar 25 08:24:14 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Tue, 25 Mar 2014 09:24:14 -0400 Subject: [maker-devel] Maker iPlant image Message-ID: Greetings, I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From ernesto at ebi.ac.uk Tue Mar 25 05:10:59 2014 From: ernesto at ebi.ac.uk (ernesto lowy gallego) Date: Tue, 25 Mar 2014 10:10:59 +0000 Subject: [maker-devel] Incorrect translation start codon Message-ID: <53315633.2070702@ebi.ac.uk> Hi, I have been inspecting the MAKER predictions and I detected a situation which appears with a certain frequency. (See attached Apollo screenshot illustrating the situation I am going to describe): Let's say that there is est2genome evidence supporting the prediction of the 5' UTR region, I have realized that in some of these transcripts with 5'UTR, MAKER is not capable of identifying the right downstream ATG protein start codon and considers a TTG codon (coding for L) as the incorrect protein start. The proper ATG codon start is further downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly predict in this case (see the attached screenshot) Any comments on this? Thanks! ernesto -- Developer VectorBase | Ensembl Genomes -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2014-03-25 at 09.34.16.png Type: image/png Size: 32220 bytes Desc: not available URL: From carsonhh at gmail.com Tue Mar 25 09:19:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Mar 2014 08:19:22 -0600 Subject: [maker-devel] Incorrect translation start codon In-Reply-To: <53315633.2070702@ebi.ac.uk> References: <53315633.2070702@ebi.ac.uk> Message-ID: This is caused by BioPerl's is_start_codon method and default codon table returning true for non-canonical start codons. It was resolved some time ago (See previous discussion --> https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY ). Make sure you are using the most recent version of MAKER (currently 2.31). Thanks, Carson https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY On 3/25/14, 4:10 AM, "ernesto lowy gallego" wrote: >Hi, > >I have been inspecting the MAKER predictions and I detected a situation >which appears with a certain frequency. >(See attached Apollo screenshot illustrating the situation I am going to >describe): > >Let's say that there is est2genome evidence supporting the prediction of >the 5' UTR region, I have realized that in some of these transcripts >with 5'UTR, MAKER is not capable of identifying the right downstream ATG >protein start codon and considers a TTG codon (coding for L) as the >incorrect protein start. The proper ATG codon start is further >downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly >predict in this case (see the attached screenshot) > >Any comments on this? > >Thanks! > >ernesto > >-- >Developer > >VectorBase | Ensembl Genomes > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Mar 25 09:24:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Mar 2014 08:24:36 -0600 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: --> /opt/maker/bin/maker It looks like most preinstalled software is under /opt on the image. Thanks, Carson From: Daniel Standage Date: Tuesday, March 25, 2014 at 7:24 AM To: Maker Mailing List Subject: [maker-devel] Maker iPlant image Greetings, I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 25 11:33:59 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 25 Mar 2014 11:33:59 -0500 Subject: [maker-devel] maker to EvidenceModeler Message-ID: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> Hi Carson and others, Is there an easy tool/pipeline available as part of maker utilities to convert maker and SNAP output to files acceptable by EvidenceModeler? It looks like it also needs just gff files, but with a few tweaks. EvidenceModeler seems better equipped to handle PASA annotation results than maker results. Thanks Dhivya From barry.utah at gmail.com Tue Mar 25 12:51:38 2014 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 25 Mar 2014 11:51:38 -0600 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER In-Reply-To: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> References: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> Message-ID: Hi Diana, There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy. I just added a script to this library called gal_CDS_sequence which you would run like this: gal_CDS_sequence --translate genes.gff3 genome.fasta The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files. You can access the library here: http://www.sequenceontology.org/software/GAL.html Support for GAL is available via the SO mailing list: https://lists.sourceforge.net/lists/listinfo/song-devel Hope that helps, Barry On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote: > Hi there, > > We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? > > Thanks! > > Diana > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at plantbiology.msu.edu Wed Mar 26 09:21:36 2014 From: kchilds at plantbiology.msu.edu (Childs, Kevin) Date: Wed, 26 Mar 2014 14:21:36 +0000 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: Daniel, There are a few small issues with the MAKER-P_2.28 image at iPlant. I have been using the image successfully for more than a month. I typically set several environmental variables immediately after starting an ssh session. export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/ export ZOE=/opt/maker/exe/snap export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config export TMP=/tmp The image will allow you to train SNAP, but training Augustus is not possible with the current image. Augustus training requires blat which was not installed in this image. There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking. I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image. Augustus should work with a stock hmm on this image. I have not attempted to use GeneMark, and of course, fgenesh is a completely different story. Kevin Childs --- Kevin Childs, PhD Assistant Professor - Fixed Term Plant Biology Department Michigan State University kchilds at plantbiology.msu.edu 517-775-2844 (m) 517-353-5969 (l) On Mar 25, 2014, at 10:24 AM, Carson Holt wrote: > --> /opt/maker/bin/maker > > It looks like most preinstalled software is under /opt on the image. > > Thanks, > Carson > > > From: Daniel Standage > Date: Tuesday, March 25, 2014 at 7:24 AM > To: Maker Mailing List > Subject: [maker-devel] Maker iPlant image > > Greetings, > > I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From steinj at cshl.edu Wed Mar 26 13:41:37 2014 From: steinj at cshl.edu (Stein, Joshua) Date: Wed, 26 Mar 2014 18:41:37 +0000 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: Also please note that there is a tutorial available here, particularly important if you want to use in MPI mode. https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial Josh Joshua Stein, PhD Manager, Sci. Informatics III Cold Spring Harbor Laboratory steinj at cshl.edu http://ware.cshl.org/ On Mar 26, 2014, at 10:20 AM, "Childs, Kevin" wrote: > Daniel, > > There are a few small issues with the MAKER-P_2.28 image at iPlant. I have been using the image successfully for more than a month. I typically set several environmental variables immediately after starting an ssh session. > > export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/ > export ZOE=/opt/maker/exe/snap > export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config > export TMP=/tmp > > The image will allow you to train SNAP, but training Augustus is not possible with the current image. Augustus training requires blat which was not installed in this image. There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking. I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image. > > Augustus should work with a stock hmm on this image. > > I have not attempted to use GeneMark, and of course, fgenesh is a completely different story. > > Kevin Childs > > > --- > Kevin Childs, PhD > > Assistant Professor - Fixed Term > Plant Biology Department > Michigan State University > > kchilds at plantbiology.msu.edu > 517-775-2844 (m) > 517-353-5969 (l) > > On Mar 25, 2014, at 10:24 AM, Carson Holt wrote: > >> --> /opt/maker/bin/maker >> >> It looks like most preinstalled software is under /opt on the image. >> >> Thanks, >> Carson >> >> >> From: Daniel Standage >> Date: Tuesday, March 25, 2014 at 7:24 AM >> To: Maker Mailing List >> Subject: [maker-devel] Maker iPlant image >> >> Greetings, >> >> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Sat Mar 29 11:24:05 2014 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Sat, 29 Mar 2014 11:24:05 -0500 Subject: [maker-devel] Missing UTRs in GFF Message-ID: I have annotated a eukaryotic genome with MAKER 2.30. I recently realized that there are a few genes in the GFF file produced by gff3_merge with inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs have their own lines in the GFF file. However, for the problematic genes, the UTRs are not specified in the GFF file and all exons are annotated as CDS. The UTRs do appear in the gene header and the protein sequences are the correct length (do not include the UTR). I have attached an example from the GFF file. Is this a known problem, or have I done something wrong? Is there an easy way to fix the GFF file? Thanks for your help, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: missing_utr.gff Type: application/octet-stream Size: 2933 bytes Desc: not available URL: From mhinsley at ebi.ac.uk Mon Mar 31 05:20:10 2014 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Mon, 31 Mar 2014 11:20:10 +0100 Subject: [maker-devel] putative preponderance of short exons?? Message-ID: <5339415A.1020509@ebi.ac.uk> Hi I've run Maker on a de novo assembly of a species of fly and then ran some simple statistics (intron/ exon/ CDS length, exons per gene) over the GFF output and compared with a couple of other species. It all looks good except that there is a surprising number of very short exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' exons removed). I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP. I'm using maker 2.31 (unpatched). Anecdotally, these short exons appear without EST or protein evidence and they all line up with canonical splice sequences (GT----AG). (but i've only looked at a few using Apollo). While there's no requirement that exons should be longer I'm suspicious of this as there must be some evolutionary relationship between these species. I've compared with a another species annotated with Maker (using SNAP and Augustus) which is more distant (not yet publicly available), and the same pattern of short exons is present. I wondered if they were created to fulfil the need for start/stop codons, but this does not appear to be the case (mostly they are mid-gene). Is there some way to adjust the predictors eg to require external evidence? or anything else you could suggest? ... I can see the following in the tutorial but I'm not sure how they could help: pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no thanks -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom -------------- next part -------------- A non-text attachment was scrubbed... Name: exon_53.pdf Type: application/pdf Size: 10618 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 31 08:52:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 31 Mar 2014 07:52:15 -0600 Subject: [maker-devel] putative preponderance of short exons?? In-Reply-To: <5339415A.1020509@ebi.ac.uk> References: <5339415A.1020509@ebi.ac.uk> Message-ID: The intron/exon structure is determined by SNAP, Augustus, etc. It is not affected by any of the maker parameters. Only evidence alignments are affected by the maker settings. You can try retraining or manually editing the HMMs, but they might also be regions where your assembly is incorrect and those algorithms make short exons in order to make a structure work without getting stop codons mid gene. Thanks, Carson On 3/31/14, 4:20 AM, "Malcolm Hinsley" wrote: >Hi > >I've run Maker on a de novo assembly of a species of fly and then ran >some simple statistics (intron/ exon/ CDS length, exons per gene) over >the GFF output and compared with a couple of other species. >It all looks good except that there is a surprising number of very short >exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached >pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' >exons removed). > >I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP. >I'm using maker 2.31 (unpatched). > >Anecdotally, these short exons appear without EST or protein evidence >and they all line up with canonical splice sequences (GT----AG). >(but i've only looked at a few using Apollo). > >While there's no requirement that exons should be longer I'm suspicious >of this as there must be some evolutionary relationship between these >species. >I've compared with a another species annotated with Maker (using SNAP >and Augustus) which is more distant (not yet publicly available), and >the same pattern of short exons is present. >I wondered if they were created to fulfil the need for start/stop >codons, but this does not appear to be the case (mostly they are >mid-gene). > > >Is there some way to adjust the predictors eg to require external >evidence? or anything else you could suggest? ... I can see the >following in the tutorial but I'm not sure how they could help: > >pred_flank=200 #flank for extending evidence clusters sent to gene >predictors >pred_stats=0 #report AED and QI statistics for all predictions as well as >models >AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and >1) >min_protein=0 #require at least this many amino acids in predicted >proteins >alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = >yes, 0 = no >always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 >= no > > >thanks > >-- >malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 >European Bioinformatics Institute (EMBL-EBI) >European Molecular Biology Laboratory >Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD >United Kingdom > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 31 09:37:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 31 Mar 2014 08:37:15 -0600 Subject: [maker-devel] Missing UTRs in GFF In-Reply-To: References: Message-ID: Not something I've seen before, but there was a patch for another issue that was cause by the use of avoid_est_fusion=1, that may be related. Try the current stable release 2.31, and let me know if it still happens. You can also upload the contig folder from one of the regions in question here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Then I could verify the bug, and see if it is something that happens in the current release. --Carson From: Benjamin Rubin Date: Saturday, March 29, 2014 at 10:24 AM To: Subject: [maker-devel] Missing UTRs in GFF I have annotated a eukaryotic genome with MAKER 2.30. I recently realized that there are a few genes in the GFF file produced by gff3_merge with inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs have their own lines in the GFF file. However, for the problematic genes, the UTRs are not specified in the GFF file and all exons are annotated as CDS. The UTRs do appear in the gene header and the protein sequences are the correct length (do not include the UTR). I have attached an example from the GFF file. Is this a known problem, or have I done something wrong? Is there an easy way to fix the GFF file? Thanks for your help, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pushplata.singh at teri.res.in Sun Mar 2 22:29:37 2014 From: pushplata.singh at teri.res.in (Pushplata Singh) Date: Mon, 3 Mar 2014 10:59:37 +0530 Subject: [maker-devel] Query on Hardware requirement Message-ID: Hi, I am trying to assemble and analyse(bio-informatics) genome sequence of a 35 GB fungal genome. The raw data that has been generated from Illumina sequencing is of ~15 GB. Could you please suggest me the system (hardware) requirement for installing and running Maker and ALLPATHS-LG sofrware for the job? Thank you Pushplata Singh, PhD Nanobiotechnology Centre Biotechnology and Management of Bioresources Division The Energy and Resources Institute Darbari Seth Block , India Habitat Centre,Lodhi Road New Delhi 110003 India Phone +91 11 24682100 ext 2611 Fax +91 11 24682145 ------------------------------------------------------------------------------------------------------------ Disclaimer: The information contained in this e-mail is intended for the person or entity to which it is addressed, and it may contain confidential and/or privileged material. Any review or other use of this mail or taking any action based on it by persons or entities other than the intended recipient is strictly prohibited. If you receive this e-mail by mistake, please contact the sender, and delete all copies of this mail.This e-mail has been scanned and verified by McAfee SaaS Email Security, formerly MX Logic. From dence at genetics.utah.edu Mon Mar 3 07:11:34 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 3 Mar 2014 14:11:34 +0000 Subject: [maker-devel] Query on Hardware requirement In-Reply-To: References: Message-ID: Hi Pradeep, I think Allpaths is developed by the Broad Institute, so you'd have to check their documentation for their system requirments. MAKER is installable on Linux and Mac OS X computers. The throughput you'll be able to achieve with MAKER depends on how many processors and how much RAM the machine has. To take advantage of MAKER's ability to parallelize the annotation process, you need some version of MPI installed on your machine. MAKER can try to install MPI for you, but a manual installation is usually required. I hope that helps. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Pushplata Singh [pushplata.singh at teri.res.in] Sent: Sunday, March 02, 2014 10:29 PM To: maker-devel at yandell-lab.org Cc: Pradeep Dahiya Subject: [maker-devel] Query on Hardware requirement Hi, I am trying to assemble and analyse(bio-informatics) genome sequence of a 35 GB fungal genome. The raw data that has been generated from Illumina sequencing is of ~15 GB. Could you please suggest me the system (hardware) requirement for installing and running Maker and ALLPATHS-LG sofrware for the job? Thank you Pushplata Singh, PhD Nanobiotechnology Centre Biotechnology and Management of Bioresources Division The Energy and Resources Institute Darbari Seth Block , India Habitat Centre,Lodhi Road New Delhi 110003 India Phone +91 11 24682100 ext 2611 Fax +91 11 24682145 ------------------------------------------------------------------------------------------------------------ Disclaimer: The information contained in this e-mail is intended for the person or entity to which it is addressed, and it may contain confidential and/or privileged material. Any review or other use of this mail or taking any action based on it by persons or entities other than the intended recipient is strictly prohibited. If you receive this e-mail by mistake, please contact the sender, and delete all copies of this mail.This e-mail has been scanned and verified by McAfee SaaS Email Security, formerly MX Logic. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Mon Mar 3 12:08:49 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 3 Mar 2014 19:08:49 +0000 Subject: [maker-devel] FW: error runinig agustus In-Reply-To: References: Message-ID: Forwarding this to the maker-devel list. On 3/3/14, 12:04 PM, "Borhan, Hossein" wrote: >I encountered the following error while running maker (2nd annotation >using gff file of the first maker run and trinity assembled RNA seq as >EST) > >ERROR: Augustus failed >--> rank=NA, hostname=rapa.agr.gc.ca > >Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I >am using 2.31 > >Your help is appreciated > > >HB > > > > > From carsonhh at gmail.com Mon Mar 3 12:11:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 03 Mar 2014 12:11:08 -0700 Subject: [maker-devel] FW: error runinig agustus Message-ID: You will need to provide more detail. Probably the entire error log and the maker control files. Thanks, Carson On 3/3/14, 12:08 PM, "Carson Holt" wrote: >Forwarding this to the maker-devel list. > > >On 3/3/14, 12:04 PM, "Borhan, Hossein" wrote: > >>I encountered the following error while running maker (2nd annotation >>using gff file of the first maker run and trinity assembled RNA seq as >>EST) >> >>ERROR: Augustus failed >>--> rank=NA, hostname=rapa.agr.gc.ca >> >>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I >>am using 2.31 >> >>Your help is appreciated >> >> >>HB >> >> >> >> >> > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sjackman at gmail.com Tue Mar 4 19:10:42 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 4 Mar 2014 18:10:42 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and *three_prime_UTR*features. Ideally the feature type would be set to *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: > Set single_exon=1, and the minimum size to a smaller value. I think it's > set to 250 right now. Also est2genome is looking for ORF, so if there is > none (as with tRNAs) they probably won't get picked up. > > --Carson > > Sent from my iPhone > > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > > Sorry, ignore my previous question. est_forward also carries forward the > names of protein evidence and works like a charm. Thank you! > > The larger rrn16 and rrn23 genes annotated perfectly, but the smaller > rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They > are in the blastn output, and in the evidence_0.gff. rrn5 has perfect > identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value > (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing > these hits? > > organism_type=prokaryotic > est2genome=1 > protein2genome=1 > est_forward=1 > > Cheers, > Shaun > > > On 27 February 2014 15:17, Shaun Jackman wrote: > >> Is there a corresponding protein_forward=1 option to map forward protein >> names from protein2genome? >> >> Cheers, >> Shaun >> >> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >> wrote: >> >> Sorry I meant to say prefilter on the score in the mRNA column before >> passing the gff3 to model_gff. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >> >> What you can do is run it once with just est_forward=1 and >> est2genome/protein2genome set to 1. Then take those results, pass them in >> as model_gff and use the map_forward option to then filter the results >> based on mRNA score and that would copy names onto new gene under the >> standard MAKER pipeline. Eventually it?s really supposed to go into a >> separate tool that will map genes onto new assemblies (but under the hood >> the tool will just be calling MAKER with certain parameters restricted). I >> do this because if people commonly use it mixed with things like SNAP I can >> start to get some very weird behaviors. >> >> Thanks, >> Carson >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 3:04 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> It seems that this could be a very useful option in those cases where >> you have firm a priori knowledge of the placement of ESTs. However, while >> trying it I note that est_forward implies that the est2genome predictor is >> turned on, implicitly. Is this necessary for this to work? I?m after the >> behavior you describe below where exonerate is made to try really hard >> within a limited region to align an est, but I would not like maker to >> produce est2genome predictions. >> >> In general, I think this maker_coor and est_forward is a feature set that >> is worthy to be promoted into a documented feature. >> >> THanks, >> Mikael >> >> 26 feb 2014 kl. 17:09 skrev Carson Holt : >> >> It will still work without est_forward. It just works a little >> differently. Keep in mind this was a hidden feature I used to find >> stubborn or hard to find missing genes after reassembly of a genome. >> >> If est_forward is provided, MAKER will parse the database to look for the >> maker_coor tags early in the pipeline. Then it will create a list of >> locations to search, and it will search them even if there are no BLAST >> results to seed the search (normally MAKER gets a BLAST result first and >> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >> look for a match using all of chr1 as the input to exonerate even when >> BLAST finds nothing (this is a very very slow search, but can help pick up >> one or two stubborn genes that don?t remap well). To allow this, MAKER >> gives exonerate looser matching parameters (i.e. allows for single base >> pair introns perhaps caused by assembly errors). The logic here is that >> given the fact that I already told MAKER that with some degree of >> confidence I expect sequence A to map to to location X, it will try its >> hardest to make it match. >> >> Without est_forward set, the maker_coor= flag still gets read in GI.pm at >> line 1563, but only after a BLAST alignment has already seeded it to the >> region (that BLAST result has the information in its description >> parameter). MAKER will then ignore seeds completely outside of maker_coor. >> In addition any BLAST seeds that overlap maker_coor will get the search >> space for alignment polishing adjusted to match maker_coor exactly. Also >> match parameters for exonerate will not be relaxed as they were with >> est_forward. >> >> As you can see the behavior, is slightly different (because it?s an >> accidental feature). >> >> Thanks, >> Carson >> >> >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 6:37 AM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> That might be a useful and time saving accidental feature. But, reading >> the code, it seems that I need to supply maker_coor but not gene_id, as >> well as the configuration option est_forward for this to work. Any >> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >> right? >> >> Mikael >> >> 26 feb 2014 kl. 14:22 skrev Carson Holt : >> >> Yes. That should work as well as an accidental feature. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >> mikael.durling at slu.se> wrote: >> >> Can this use of maker_coor be used only to hint about the placement of >> the ests, without affecting the naming of the final genes? Ie if I have a >> database of EST where I have a priori knowledge of their rough placement, >> can this placement be given to maker without providing est_forward=1? >> >> Thanks, >> Mikael >> >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >> There is a way. It?s not a standard option and it?s undocumented, but >> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >> that. The option won?t already be there so you?ll have to type it in. >> >> There is also a feature designed to work with this option. If you add >> tags to your fasta headers, those can be used to guide the mapping and >> naming. For example, gene_id= will ensure different isoforms >> that share a common gene_id get clustered into the same gene, >> and maker_coor=chr1:1-10000 in the fasta header will force a particular >> sequence to only be mapped against chr1 within the range of 1-10000 bp and >> just using maker_coor=chr1 will force it to only be mapped against chr1. >> >> This is an undocumented way to remap genes onto new assemblies using >> blast alignments of earlier transcript or protein annotations as a guide. >> >> ?Carson >> >> >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, February 25, 2014 at 5:06 PM >> To: >> Subject: [maker-devel] Mapping gene names >> >> Hi, >> >> I?m annotating a genome using a closely related genome from Genbank, >> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >> annotate my genome. I?ve run Maker, and the annotation seems to have worked >> well. Is it possible to map the names of the genes from the related species >> to my annotation? I see the *map_forward* option, which applies to the >> *model_gff* parameter. Is there a similar option for *est* and *protein*? >> >> *maker_opts.ctl* >> >> est=NC_123456.frn >> protein=NC_123456.faa >> est2genome=1 >> protein2genome=1 >> >> Thanks, >> Shaun >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 4 19:33:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 04 Mar 2014 19:33:12 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. It focuses only on the coding genes. You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: > Set single_exon=1, and the minimum size to a smaller value. I think it's set > to 250 right now. Also est2genome is looking for ORF, so if there is none (as > with tRNAs) they probably won't get picked up. > > --Carson > > Sent from my iPhone > > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > >> Sorry, ignore my previous question. est_forward also carries forward the >> names of protein evidence and works like a charm. Thank you! >> >> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 >> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the >> blastn output, and in the evidence_0.gff. rrn5 has perfect identity, >> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < >> eval_blastn=1e-10). How should I debug which filter is removing these hits? >> organism_type=prokaryotic >> est2genome=1 >> protein2genome=1 >> est_forward=1 >> Cheers, >> Shaun >> >> >> >> On 27 February 2014 15:17, Shaun Jackman wrote: >>> Is there a corresponding protein_forward=1 option to map forward protein >>> names from protein2genome? >>> >>> >>> Cheers, >>> Shaun >>> >>> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>> ) wrote: >>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>>> What you can do is run it once with just est_forward=1 and >>>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>>> as model_gff and use the map_forward option to then filter the results >>>>> based on mRNA score and that would copy names onto new gene under the >>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>> the tool will just be calling MAKER with certain parameters restricted). >>>>> I do this because if people commonly use it mixed with things like SNAP I >>>>> can start to get some very weird behaviors. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> It seems that this could be a very useful option in those cases where you >>>>> have firm a priori knowledge of the placement of ESTs. However, while >>>>> trying it I note that est_forward implies that the est2genome predictor is >>>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>>> behavior you describe below where exonerate is made to try really hard >>>>> within a limited region to align an est, but I would not like maker to >>>>> produce est2genome predictions. >>>>> >>>>> In general, I think this maker_coor and est_forward is a feature set that >>>>> is worthy to be promoted into a documented feature. >>>>> >>>>> THanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>>> It will still work without est_forward. It just works a little >>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>> >>>>>> If est_forward is provided, MAKER will parse the database to look for the >>>>>> maker_coor tags early in the pipeline. Then it will create a list of >>>>>> locations to search, and it will search them even if there are no BLAST >>>>>> results to seed the search (normally MAKER gets a BLAST result first and >>>>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>>>> look for a match using all of chr1 as the input to exonerate even when >>>>>> BLAST finds nothing (this is a very very slow search, but can help pick >>>>>> up one or two stubborn genes that don?t remap well). To allow this, >>>>>> MAKER gives exonerate looser matching parameters (i.e. allows for single >>>>>> base pair introns perhaps caused by assembly errors). The logic here is >>>>>> that given the fact that I already told MAKER that with some degree of >>>>>> confidence I expect sequence A to map to to location X, it will try its >>>>>> hardest to make it match. >>>>>> >>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at >>>>>> line 1563, but only after a BLAST alignment has already seeded it to the >>>>>> region (that BLAST result has the information in its description >>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>> exactly. Also match parameters for exonerate will not be relaxed as they >>>>>> were with est_forward. >>>>>> >>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>> accidental feature). >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> >>>>>> From: Mikael Brandstr?m Durling >>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>> To: Carson Holt >>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>> >>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>> well as the configuration option est_forward for this to work. Any >>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>> set_forward=1 right? >>>>>> >>>>>> Mikael >>>>>> >>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> > >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.bemm at uni-wuerzburg.de Wed Mar 5 09:35:33 2014 From: felix.bemm at uni-wuerzburg.de (Felix Bemm) Date: Wed, 05 Mar 2014 17:35:33 +0100 Subject: [maker-devel] Build Issues - v2.31 Message-ID: <53175255.4050102@uni-wuerzburg.de> Hi, I am trying to build maker version 2.31. Got the following error: Configuring MAKER with MPI support 'CCFLAGSEX' is not a valid config option for Inline::C at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm line 236 at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm line 256 Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', '/software/mpich2-1.5rc3/include', 'blib', '') called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277 MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2024 Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007 Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469 MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2024 Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012 Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at ./Build line 70 Same procedure worked with 2.29-beta! Any ideas? Felix -- Felix Bemm Department of Bioinformatics University of W?rzburg, Germany Tel: +49 931 - 31 83696 Fax: +49 931 - 31 84552 felix.bemm at uni-wuerzburg.de From carsonhh at gmail.com Wed Mar 5 09:40:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Mar 2014 09:40:05 -0700 Subject: [maker-devel] Build Issues - v2.31 In-Reply-To: <53175255.4050102@uni-wuerzburg.de> References: <53175255.4050102@uni-wuerzburg.de> Message-ID: You need to update your Inline::C module. The CCFLAGSEX option was added to Inline::C a couple of years ago to allow users to pass in flags to the compiler. Thanks, Carson On 3/5/14, 9:35 AM, "Felix Bemm" wrote: >Hi, > >I am trying to build maker version 2.31. Got the following error: > >Configuring MAKER with MPI support >'CCFLAGSEX' is not a valid config option for Inline::C > at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm >line 236 > at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm >line 256 > Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', >'/software/mpich2-1.5rc3/include', 'blib', '') called at >/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277 > MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at >/usr/share/perl/5.14/Module/Build/Base.pm line 2024 > Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', >'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007 > Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') >called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469 > MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at >/usr/share/perl/5.14/Module/Build/Base.pm line 2024 > Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', >'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012 > Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at >./Build line 70 > >Same procedure worked with 2.29-beta! > >Any ideas? > >Felix > >-- >Felix Bemm >Department of Bioinformatics >University of W?rzburg, Germany >Tel: +49 931 - 31 83696 >Fax: +49 931 - 31 84552 >felix.bemm at uni-wuerzburg.de > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Wed Mar 5 12:02:26 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 5 Mar 2014 19:02:26 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: Message-ID: On 3/5/14, 11:59 AM, "Borhan, Hossein" wrote: >Dear Maker users > >I want to run maker on a fungal genome of about 45 Mb with about 1/3 of >the genome begin repeat rich. But most of the virulent genes are located >within the repeat regions flanked but stretch of repeats. I am not sure >if I use the repeat masker option I am going to miss out on the >predication of these virulent genes located within the repeats. > >Other concerns with the setting in maker-opts file for fungal genomes are: > >single_exon = 0 should this get changed to 1 since single exon genes >are quit common in fungi and what is the consequence of this on using EST >and assembled RNA as evidence for gene prediction > >correct_est_fusion=0 #limits use of ESTs in annotation >to avoid fusion genes as I understand this option will remove the >overlapping UTRs but what is the consequence of setting this option on >the use of EST for predicting ORFs > > >Thanks > > > >HB > > > > From carsonhh at gmail.com Wed Mar 5 12:17:57 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Mar 2014 12:17:57 -0700 Subject: [maker-devel] FW: maker-control file Message-ID: Not using repeat masking will cause many problems. Beside a gene being flanked by repeats does not mean it will be lost, any evidence/alignments that can seed in non-repetative regions (gene/exon) are still allowed to extend into repetitive regions during the polishing stage (aligners have two stages - seed and extend). So transposons should never seed, but genes will because there sequence will contain non-repetative regions (even if they are near repeats). single_exon should be set to 1 for fungi, just make sure to set the minimum length of single exon evidence to something reasonable like 250bp. correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). Thanks, Carson > > >On 3/5/14, 11:59 AM, "Borhan, Hossein" <> wrote: > >>Dear Maker users >> >>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of >>the genome begin repeat rich. But most of the virulent genes are located >>within the repeat regions flanked but stretch of repeats. I am not sure >>if I use the repeat masker option I am going to miss out on the >>predication of these virulent genes located within the repeats. >> >>Other concerns with the setting in maker-opts file for fungal genomes >>are: >> >>single_exon = 0 should this get changed to 1 since single exon genes >>are quit common in fungi and what is the consequence of this on using EST >>and assembled RNA as evidence for gene prediction >> >>correct_est_fusion=0 #limits use of ESTs in annotation >>to avoid fusion genes as I understand this option will remove the >>overlapping UTRs but what is the consequence of setting this option on >>the use of EST for predicting ORFs >> >> >>Thanks >> >> >> >>HB >> >> >> >> > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Thu Mar 6 00:26:29 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Thu, 6 Mar 2014 07:26:29 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: Message-ID: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Hi, I think this is an interesting comment that I would like a few more information on: correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 07:29:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 07:29:35 -0700 Subject: [maker-devel] FW: maker-control file In-Reply-To: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Message-ID: > Wouldn?t it be more sensible to rely on the evidence over probabilistic > models? Yes. Infact that is the backbone of MAKER. The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic). Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER. With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot). > The annotation would be partial, but on the other hand the chance of > incorporating false signals are smaller (assuming I can generate a clean set > of transcripts from RNA-seq data)? False signals are abundant. It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly. By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated. > As an example, using SNAP and Augustus on a bird genome - with augustus > achieving nucleotide and exon sensitivities in the 70-90% range gave a host if > false exons that were simply not supported by the RNAseq data, yet made it > into the final gene build. You will get false positives from est2genome alone approach as well. Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate). Also look at the MAKER2 paper Figure 1. The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors). > Is it possible to get some more details on how Maker uses ab-inito predictions > and reconciles them with evidence alignments? At the moment it seems to me > that maker gives higher weight to the ab-initio predictions, which to me seems > problematic. Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen based off of evidence overlap using AED (completely evidence based). It is the model generation that leverages the hint based feedback. The names of MAKER genes can let you know what the source of the model is. Any time hint based models match the evidence better the name will have hame like this ?> maker---gene- (I.e. maker-chr1-snap-gene-0.4) When the ab initio model matches better than the hint based model the name is like this ?> --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models. Thanks, Carson From: Marc H?ppner Date: Thursday, March 6, 2014 at 12:26 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] FW: maker-control file Hi, I think this is an interesting comment that I would like a few more information on: > > correct_est_fusion should not be used together with est2genome. It won?t > fail, you just get odd results. Actually est2genome should not ever be > used to generate the final annotation set. It is a convenience method > that allows you to generate rough models for training gene predictors like > SNAP and Augustus. But once they are trained it should be turned off, > because the models it produces will be partial (Ests rarely cover the > whole transcript) and the results will have many false potties from > background transcription events from your EST data. These models are good > enough to train with, but make very poor final annotations. So in the end > you should be using correct_est_fusion=1 with the SNAP pr Augustus set and > not est2genome (which should already have been turned off by then). > My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Thu Mar 6 07:40:48 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Thu, 6 Mar 2014 14:40:48 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Message-ID: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> Hi Carson, Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho). But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc. Regards, Marc On 06 Mar 2014, at 15:29, Carson Holt > wrote: Wouldn?t it be more sensible to rely on the evidence over probabilistic models? Yes. Infact that is the backbone of MAKER. The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic). Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER. With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot). The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? False signals are abundant. It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly. By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. You will get false positives from est2genome alone approach as well. Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate). Also look at the MAKER2 paper Figure 1. The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors). Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen based off of evidence overlap using AED (completely evidence based). It is the model generation that leverages the hint based feedback. The names of MAKER genes can let you know what the source of the model is. Any time hint based models match the evidence better the name will have hame like this ?> maker---gene- (I.e. maker-chr1-snap-gene-0.4) When the ab initio model matches better than the hint based model the name is like this ?> --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models. Thanks, Carson From: Marc H?ppner > Date: Thursday, March 6, 2014 at 12:26 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] FW: maker-control file Hi, I think this is an interesting comment that I would like a few more information on: correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 08:03:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 08:03:10 -0700 Subject: [maker-devel] FW: maker-control file In-Reply-To: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> Message-ID: MAKER wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page Thanks, Carson From: Marc H?ppner Date: Thursday, March 6, 2014 at 7:40 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] FW: maker-control file Hi Carson, Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho). But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc. Regards, Marc On 06 Mar 2014, at 15:29, Carson Holt wrote: >> Wouldn?t it be more sensible to rely on the evidence over probabilistic >> models? > > Yes. Infact that is the backbone of MAKER. The evidence is used to derive > hints that are passed back into the predictors and reviewed in light of the > evidence to decide on final models (no longer strictly probabalistic). Take a > look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when > you use the wrong species parameters in the predictor (I.e. A. thaliana to > annotate C. elegant) you get as much as a 3 fold increase in exon level > accuracy by using the hint feedback from MAKER. With est2genome option you > don?t get that hint feedback (normally probabilistic models, EST evidence, and > protein evidence would all work together), and the models are overall poorer > and contain more false positives (we have looked at this a lot). > > >> The annotation would be partial, but on the other hand the chance of >> incorporating false signals are smaller (assuming I can generate a clean set >> of transcripts from RNA-seq data)? > > False signals are abundant. It?s just the nature of how ESTs and especially > mRNAseq reads are generated and anchored back to the assembly. By letting > there be feedback between the probabilistic model and the evidence (both > protein and EST/mRNAseq) a lot of this is eliminated. > > >> As an example, using SNAP and Augustus on a bird genome - with augustus >> achieving nucleotide and exon sensitivities in the 70-90% range gave a host >> if false exons that were simply not supported by the RNAseq data, yet made it >> into the final gene build. > > You will get false positives from est2genome alone approach as well. Models > will be more partial, and false negative rate will be very high (often 30-70% > false negative rate). Also look at the MAKER2 paper Figure 1. The false > positive rate from ab initio alone can be quite high, but with the evidence > feedback it is substantially reduced (especially for poorly trained > predictors). > > >> Is it possible to get some more details on how Maker uses ab-inito >> predictions and reconciles them with evidence alignments? At the moment it >> seems to me that maker gives higher weight to the ab-initio predictions, >> which to me seems problematic. > > Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen > based off of evidence overlap using AED (completely evidence based). It is > the model generation that leverages the hint based feedback. The names of > MAKER genes can let you know what the source of the model is. Any time hint > based models match the evidence better the name will have hame like this ?> > maker---gene- (I.e. maker-chr1-snap-gene-0.4) > > When the ab initio model matches better than the hint based model the name is > like this ?> > --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) > > > In summary, using est2genome alone (while good for generating training sets) > undercuts the power of the evidence feedback together with the probabilistic > models. > > > Thanks, > Carson > > From: Marc H?ppner > Date: Thursday, March 6, 2014 at 12:26 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] FW: maker-control file > > Hi, > > I think this is an interesting comment that I would like a few more > information on: > >> >> correct_est_fusion should not be used together with est2genome. It won?t >> fail, you just get odd results. Actually est2genome should not ever be >> used to generate the final annotation set. It is a convenience method >> that allows you to generate rough models for training gene predictors like >> SNAP and Augustus. But once they are trained it should be turned off, >> because the models it produces will be partial (Ests rarely cover the >> whole transcript) and the results will have many false potties from >> background transcription events from your EST data. These models are good >> enough to train with, but make very poor final annotations. So in the end >> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and >> not est2genome (which should already have been turned off by then). >> > > My experience has been that the process of training gene finders, especially > for complex genomes like vertebrates, is a very slow and painful process. And > ultimately, the results are far from accurate, even with a sizeable, manually > curated training set. Wouldn?t it be more sensible to rely on the evidence > over probabilistic models? The annotation would be partial, but on the other > hand the chance of incorporating false signals are smaller (assuming I can > generate a clean set of transcripts from RNA-seq data)? And I?d rather > underestimate the exon inventory slightly than putting out an annotation with > ~ 10% false exon calls. > > As an example, using SNAP and Augustus on a bird genome - with augustus > achieving nucleotide and exon sensitivities in the 70-90% range gave a host if > false exons that were simply not supported by the RNAseq data, yet made it > into the final gene build. Not sure what to think about that to be honest. Is > it possible to get some more details on how Maker uses ab-inito predictions > and reconciles them with evidence alignments? At the moment it seems to me > that maker gives higher weight to the ab-initio predictions, which to me seems > problematic. > > > /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Mar 6 13:56:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Mar 2014 12:56:34 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson? Sent from my iPhone On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 13:58:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 13:58:41 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Yes. I?ll fix the naming. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 1:56 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: > Trying to call non-coding RNA from ESTs or even sequence homology is extremely > messy (non-trivial problem in most organisms with high false positive rate), > so MAKER for the most part doesn?t even try to do that. It focuses only on > the coding genes. You can now use tRNAscan and snoscan in the newest version > for some non-coding RNA support (those features were only added a couple of > months ago). So just like other prediction tools (snap, augustus etc.), the > primary focus has always been the coding genes. We?ve only started adding > non-coding RNA support recently for iPlant, so it?s still relatively immature. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, March 4, 2014 at 7:10 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the > tip. > > The rRNA genes that are found with est2genome have the feature type set to > mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. > Ideally the feature type would be set to rRNA or tRNA as appropriate, and > would omit the UTR and CDS features. Is that a feature that you would be > interested in adding to MAKER? The rRNA gene names all start with ?rrn? and > the tRNA gene names with ?trn?, as is standard, so determining the appropriate > type should be straight forward. > > Thanks again for your help with this. Cheers, > Shaun > > > > On 27 February 2014 17:13, Carson Holt wrote: >> Set single_exon=1, and the minimum size to a smaller value. I think it's set >> to 250 right now. Also est2genome is looking for ORF, so if there is none >> (as with tRNAs) they probably won't get picked up. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 >>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in >>> the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, >>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < >>> eval_blastn=1e-10). How should I debug which filter is removing these hits? >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> Cheers, >>> Shaun >>> >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>> Is there a corresponding protein_forward=1 option to map forward protein >>>> names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>> ) wrote: >>>>> >>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>> passing the gff3 to model_gff. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>> >>>>>> What you can do is run it once with just est_forward=1 and >>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>> in as model_gff and use the map_forward option to then filter the results >>>>>> based on mRNA score and that would copy names onto new gene under the >>>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>>> the tool will just be calling MAKER with certain parameters restricted). >>>>>> I do this because if people commonly use it mixed with things like SNAP I >>>>>> can start to get some very weird behaviors. >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> From: Mikael Brandstr?m Durling >>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>> To: Carson Holt >>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>> >>>>>> It seems that this could be a very useful option in those cases where you >>>>>> have firm a priori knowledge of the placement of ESTs. However, while >>>>>> trying it I note that est_forward implies that the est2genome predictor >>>>>> is turned on, implicitly. Is this necessary for this to work? I?m after >>>>>> the behavior you describe below where exonerate is made to try really >>>>>> hard within a limited region to align an est, but I would not like maker >>>>>> to produce est2genome predictions. >>>>>> >>>>>> In general, I think this maker_coor and est_forward is a feature set that >>>>>> is worthy to be promoted into a documented feature. >>>>>> >>>>>> THanks, >>>>>> Mikael >>>>>> >>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Mar 6 16:00:40 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 6 Mar 2014 23:00:40 +0000 Subject: [maker-devel] maker problem with running blast In-Reply-To: References: Message-ID: Your blast_type parameter in maker_bopts.ctl is set to 'wublast' but the executables for wublast are blank in maker_exe.ctl. See, they?re blank ?> xdformat=#location of WUBLAST xdformat executable blasta=#location of WUBLAST blasta executable You either need to provide executables or set your blast_type parameter to something else. For example, you could set it to 'NCBI+', but you will nee to fix the location of makeblastdb. makeblastdb is set incorrectly here?> makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of NCBI+ makeblastdb executable Alternativley you can set blast_type to 'NCBI', but you will need to uncomment the executables. Here?> formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable blastall=#/usr/local/bin/blastall #location of NCBI blastall executable ?Carson On 3/6/14, 3:51 PM, "Borhan, Hossein" wrote: >Hi > >I have installed latest version of blast+ and provided the excitable path >to the maker_exec.ctl as follow > >#-----Location of Executables Used by MAKER/EVALUATOR >makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of >NCBI+ makeblastdb executable >blastn=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastn #location >of NCBI+ blastn executable >blastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastx #location >of NCBI+ blastx executable >tblastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/tblastx >#location of NCBI+ tblastx executable >formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable >blastall=#/usr/local/bin/blastall #location of NCBI blastall executable >xdformat=#location of WUBLAST xdformat executable >blasta=#location of WUBLAST blasta executable >RepeatMasker=/usr/local/RepeatMasker/RepeatMasker #location of >RepeatMasker executable >exonerate=/home/AAFC-AAC/borhanh/bin/exonerate-2.2.0-x86_64/bin/exonerate >#location of exonerate executable > >#-----Ab-initio Gene Prediction Algorithms >snap=/home/AAFC-AAC/borhanh/bin/snap/snap #location of snap executable >gmhmme3=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/gmhmme3 >#location of eukaryotic genemark executable >gmhmmp= #location of prokaryotic genemark executable >augustus=/usr/local/augustus.2.5.5/bin/augustus #location of augustus >executable >fgenesh=/usr/local/FGENESH/fgenesh #location of fgenesh executable > >#-----Other Algorithms >fathom=/home/AAFC-AAC/borhanh/bin/snap/fathom #location of fathom >executable (experimental) >probuild=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/probuild >#location of probuild executable (required for genemark) > > > > > >But when running maker I get this error > > >STATUS: Parsing control files... >WARNING: blast_type is set to 'wublast' but executables cannot be located >ERROR: Please provide a valid locaction for a BLAST algorithm in the >control files. > > > > > > > From sjackman at gmail.com Thu Mar 6 16:33:04 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Mar 2014 15:33:04 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models? Cheers, Shaun On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote: Yes. ?I?ll fix the naming. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 1:56 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson? Sent from my iPhone On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 16:38:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 16:38:48 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Well? not really. I have no plans to add est2genome support for noncoding genes (non-trivial), so you would either have to remove the ncRNA from your input, or filter it out downstream. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 4:33 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models? Cheers, Shaun On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote: > Yes. I?ll fix the naming. > > Thanks, > Carson > > > From: Shaun Jackman > Date: Thursday, March 6, 2014 at 1:56 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I agree that identifying non-coding RNA by homology in general is > a non-trivial problem. In my particular case, I have a well annotated > reference species that is very closely related (99.2% sequence identity), so > lifting over the annotations from that reference species to my species should > be pretty straight forward. It would be great if MAKER had an option for RNA > sequence homology similar to est2genome that does not imply the sequence is > coding. > > The integration of MAKER-P with tRNAscan is very useful. The identified genes > are named e.g. `trnascan-205522-processed-gene-0.38`. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names > with that convention? > > Cheers, > Shaun > > > On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: >> >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were only >> added a couple of months ago). So just like other prediction tools (snap, >> augustus etc.), the primary focus has always been the coding genes. We?ve >> only started adding non-coding RNA support recently for iPlant, so it?s still >> relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for >> the tip. >> >> The rRNA genes that are found with est2genome have the feature type set to >> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. >> Ideally the feature type would be set to rRNA or tRNA as appropriate, and >> would omit the UTR and CDS features. Is that a feature that you would be >> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and >> the tRNA gene names with ?trn?, as is standard, so determining the >> appropriate type should be straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >>> Set single_exon=1, and the minimum size to a smaller value. I think it's >>> set to 250 right now. Also est2genome is looking for ORF, so if there is >>> none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>>> Sorry, ignore my previous question. est_forward also carries forward the >>>> names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>> these hits? >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> Cheers, >>>> Shaun >>>> >>>> >>>> >>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>> Is there a corresponding protein_forward=1 option to map forward protein >>>>> names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>> >>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>>> ) wrote: >>>>>> >>>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>>> passing the gff3 to model_gff. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and >>>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>>> in as model_gff and use the map_forward option to then filter the >>>>>>> results based on mRNA score and that would copy names onto new gene >>>>>>> under the standard MAKER pipeline. Eventually it?s really supposed to >>>>>>> go into a separate tool that will map genes onto new assemblies (but >>>>>>> under the hood the tool will just be calling MAKER with certain >>>>>>> parameters restricted). I do this because if people commonly use it >>>>>>> mixed with things like SNAP I can start to get some very weird >>>>>>> behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where >>>>>>> you have firm a priori knowledge of the placement of ESTs. However, >>>>>>> while trying it I note that est_forward implies that the est2genome >>>>>>> predictor is turned on, implicitly. Is this necessary for this to work? >>>>>>> I?m after the behavior you describe below where exonerate is made to try >>>>>>> really hard within a limited region to align an est, but I would not >>>>>>> like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set >>>>>>> that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Thu Mar 6 16:41:55 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Thu, 6 Mar 2014 23:41:55 +0000 Subject: [maker-devel] Long introns from Augustus Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08236@EXCHANGE-MB01.internal.solazyme.com> Hi, we have a very compact genome and we are getting a lot of fused gene models from running Augustus. I am wondering if anyone has any advice about how to prevent introns above a certain cutoff from being created? I tried a couple of things, some settings in a probabilities file and also changing a long list of probabilities to another file that someone had suggested on a forum. So far I don't really see any changes though. Any advice would be greatly appreciated. Thanks, Shane From carsonhh at gmail.com Thu Mar 6 16:46:53 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 16:46:53 -0700 Subject: [maker-devel] Long introns from Augustus Message-ID: Are these the ab intio calls that are merged or final MAKER models. ?Carson On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >Hi, we have a very compact genome and we are getting a lot of fused gene >models from running Augustus. I am wondering if anyone has any advice >about how to prevent introns above a certain cutoff from being created? > >I tried a couple of things, some settings in a probabilities file and >also changing a long list of probabilities to another file that someone >had suggested on a forum. So far I don't really see any changes though. > >Any advice would be greatly appreciated. > >Thanks, >Shane > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sbrubaker at solazyme.com Thu Mar 6 17:48:15 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 7 Mar 2014 00:48:15 +0000 Subject: [maker-devel] Long introns from Augustus In-Reply-To: References: Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, March 06, 2014 3:47 PM To: Shane Brubaker; maker-devel at yandell-lab.org Subject: Re: [maker-devel] Long introns from Augustus Are these the ab intio calls that are merged or final MAKER models. ?Carson On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >Hi, we have a very compact genome and we are getting a lot of fused >gene models from running Augustus. I am wondering if anyone has any >advice about how to prevent introns above a certain cutoff from being created? > >I tried a couple of things, some settings in a probabilities file and >also changing a long list of probabilities to another file that someone >had suggested on a forum. So far I don't really see any changes though. > >Any advice would be greatly appreciated. > >Thanks, >Shane > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Mon Mar 10 04:27:25 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 10:27:25 +0000 Subject: [maker-devel] keep_preds values Message-ID: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> Hi, Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either. Thanks, Mikael From robert.king at rothamsted.ac.uk Mon Mar 10 06:17:07 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Mon, 10 Mar 2014 12:17:07 +0000 Subject: [maker-devel] annotation comparison aed plots Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Dear Maker Developers, I've updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I've produced two annotations using as evidence: Annotation 1: Uniprot proteins search using species keyword "fusarium" Pubmed mRNA for the name of the organism Prior annotation reference transcripts Annotation 2: Uniprot proteins search using species keyword "fusarium" Pubmed mRNA for the name of the organism Prior annotation reference transcripts mRNA trinity assembly pasafly of different strain (only RNA-seq available) I'm not sure if it was a smart move to use the prior annotation reference transcripts? I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I've got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I'm waiting on some perl modules to be installed by our administrator to test out the "Evaluator" and "compare" programs too, what do they do? Best Wishes Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Mar 10 08:47:42 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 10 Mar 2014 14:47:42 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> References: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> Message-ID: Hi Mikael, The keep_preds parameter is often used the same as a binary parameter, but it doesn't have to be. The concordance that is mentioned in the comment line is the AED for that prediction. AED is a measurement of how well a prediction is supported by the evidence and ranges from 0 - 1. A prediction with an AED of 0 matches the evidence exactly while a prediction with an AED of 1 isn't overlapped by any evidence. The default behavior for MAKER is to make a gene model out of a prediction with any AED <1. When you change the keep_preds option from 0 to 1, then MAKER will make a gene model out of any prediction that matches the other parameters (like single_exon, min_exon, etc). Setting the keep_preds option to somewhere in between 0 and 1 will set a ceiling on the AED required for promoting a prediction to a gene model. >From a user standpoint, when you will almost certainly lose gene models when you set AED at an intermediate value, but you might benefit by knowing that all your models will now have an AED of at least a certain value. I hope that helps; let me know if it didn't. ~Daniel PS The original paper that described the AED is Eilbeck et al in BMC Bioinformatics 2009. It's also discussed in more detail in the MAKER2 paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews Genetics paper from 2012. Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandstr?m Durling [mikael.durling at slu.se] Sent: Monday, March 10, 2014 4:27 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] keep_preds values Hi, Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either. Thanks, Mikael _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 10 09:51:21 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 08:51:21 -0700 Subject: [maker-devel] keep_preds values Message-ID: Actually that is false. The keep_preds option is still binary. Any value other than 0 sets it to true. There was discussion about making it a non-binary value, but that has not been implemented. ?Carson On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >Hi Mikael, > >The keep_preds parameter is often used the same as a binary parameter, >but it doesn't have to be. The concordance that is mentioned in the >comment line is the AED for that prediction. AED is a measurement of how >well a prediction is supported by the evidence and ranges from 0 - 1. A >prediction with an AED of 0 matches the evidence exactly while a >prediction with an AED of 1 isn't overlapped by any evidence. > >The default behavior for MAKER is to make a gene model out of a >prediction with any AED <1. When you change the keep_preds option from 0 >to 1, then MAKER will make a gene model out of any prediction that >matches the other parameters (like single_exon, min_exon, etc). Setting >the keep_preds option to somewhere in between 0 and 1 will set a ceiling >on the AED required for promoting a prediction to a gene model. > >From a user standpoint, when you will almost certainly lose gene models >when you set AED at an intermediate value, but you might benefit by >knowing that all your models will now have an AED of at least a certain >value. > >I hope that helps; let me know if it didn't. > >~Daniel > >PS The original paper that described the AED is Eilbeck et al in BMC >Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >Genetics paper from 2012. > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Mikael Brandstr?m Durling [mikael.durling at slu.se] >Sent: Monday, March 10, 2014 4:27 AM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] keep_preds values > >Hi, > >Can someone, please, explain the keep_preds parameter, as it works now >with a value between 1 and 0? It used to be binary, but now it seems to >test concordance towards something. The maker wiki doesn?t explain it any >further either. > >Thanks, >Mikael > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Mon Mar 10 08:57:23 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 14:57:23 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: Hi Carson and Daniel, That sounds more logical to me. Then it would be appropriate to change the comment of keep_preds in the generated config files. Would it make sense to make keep_preds a non-binary value to evaluate the concordance between ab initio models obtained from different predictors? That would assume that it is less likely to be a false positive when two or more predictors suggest the same unsported model? Mikael 10 mar 2014 kl. 16:51 skrev Carson Holt : > Actually that is false. The keep_preds option is still binary. Any value > other than 0 sets it to true. There was discussion about making it a > non-binary value, but that has not been implemented. > > ?Carson > > > On 3/10/14, 7:47 AM, "Daniel Ence" wrote: > >> Hi Mikael, >> >> The keep_preds parameter is often used the same as a binary parameter, >> but it doesn't have to be. The concordance that is mentioned in the >> comment line is the AED for that prediction. AED is a measurement of how >> well a prediction is supported by the evidence and ranges from 0 - 1. A >> prediction with an AED of 0 matches the evidence exactly while a >> prediction with an AED of 1 isn't overlapped by any evidence. >> >> The default behavior for MAKER is to make a gene model out of a >> prediction with any AED <1. When you change the keep_preds option from 0 >> to 1, then MAKER will make a gene model out of any prediction that >> matches the other parameters (like single_exon, min_exon, etc). Setting >> the keep_preds option to somewhere in between 0 and 1 will set a ceiling >> on the AED required for promoting a prediction to a gene model. >> >> From a user standpoint, when you will almost certainly lose gene models >> when you set AED at an intermediate value, but you might benefit by >> knowing that all your models will now have an AED of at least a certain >> value. >> >> I hope that helps; let me know if it didn't. >> >> ~Daniel >> >> PS The original paper that described the AED is Eilbeck et al in BMC >> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >> Genetics paper from 2012. >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >> Mikael Brandstr?m Durling [mikael.durling at slu.se] >> Sent: Monday, March 10, 2014 4:27 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] keep_preds values >> >> Hi, >> >> Can someone, please, explain the keep_preds parameter, as it works now >> with a value between 1 and 0? It used to be binary, but now it seems to >> test concordance towards something. The maker wiki doesn?t explain it any >> further either. >> >> Thanks, >> Mikael >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Mon Mar 10 09:59:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 08:59:43 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: Yes. It will eventually perform an AED like calculation between multiple predictors (i.e. if you use 3 predictors it, then you require support by at least 2 predictors across all exons to get a value of 0.33). A value of 0 would be perfect concordance across all 3 predictors. ?Carson On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" wrote: >Hi Carson and Daniel, > >That sounds more logical to me. Then it would be appropriate to change >the comment of keep_preds in the generated config files. > >Would it make sense to make keep_preds a non-binary value to evaluate the >concordance between ab initio models obtained from different predictors? >That would assume that it is less likely to be a false positive when two >or more predictors suggest the same unsported model? > >Mikael > > >10 mar 2014 kl. 16:51 skrev Carson Holt : > >> Actually that is false. The keep_preds option is still binary. Any >>value >> other than 0 sets it to true. There was discussion about making it a >> non-binary value, but that has not been implemented. >> >> ?Carson >> >> >> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >> >>> Hi Mikael, >>> >>> The keep_preds parameter is often used the same as a binary parameter, >>> but it doesn't have to be. The concordance that is mentioned in the >>> comment line is the AED for that prediction. AED is a measurement of >>>how >>> well a prediction is supported by the evidence and ranges from 0 - 1. A >>> prediction with an AED of 0 matches the evidence exactly while a >>> prediction with an AED of 1 isn't overlapped by any evidence. >>> >>> The default behavior for MAKER is to make a gene model out of a >>> prediction with any AED <1. When you change the keep_preds option from >>>0 >>> to 1, then MAKER will make a gene model out of any prediction that >>> matches the other parameters (like single_exon, min_exon, etc). Setting >>> the keep_preds option to somewhere in between 0 and 1 will set a >>>ceiling >>> on the AED required for promoting a prediction to a gene model. >>> >>> From a user standpoint, when you will almost certainly lose gene models >>> when you set AED at an intermediate value, but you might benefit by >>> knowing that all your models will now have an AED of at least a certain >>> value. >>> >>> I hope that helps; let me know if it didn't. >>> >>> ~Daniel >>> >>> PS The original paper that described the AED is Eilbeck et al in BMC >>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>> Genetics paper from 2012. >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>> Sent: Monday, March 10, 2014 4:27 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] keep_preds values >>> >>> Hi, >>> >>> Can someone, please, explain the keep_preds parameter, as it works now >>> with a value between 1 and 0? It used to be binary, but now it seems to >>> test concordance towards something. The maker wiki doesn?t explain it >>>any >>> further either. >>> >>> Thanks, >>> Mikael >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From mikael.durling at slu.se Mon Mar 10 09:08:16 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 15:08:16 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Ok. But that is not implemented no as far as I can tell from the source, right? Or is it reflected in the AED for the unsupported models? Mikael 10 mar 2014 kl. 16:59 skrev Carson Holt : > Yes. It will eventually perform an AED like calculation between multiple > predictors (i.e. if you use 3 predictors it, then you require support by > at least 2 predictors across all exons to get a value of 0.33). A value > of 0 would be perfect concordance across all 3 predictors. > > ?Carson > > > > > On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" > wrote: > >> Hi Carson and Daniel, >> >> That sounds more logical to me. Then it would be appropriate to change >> the comment of keep_preds in the generated config files. >> >> Would it make sense to make keep_preds a non-binary value to evaluate the >> concordance between ab initio models obtained from different predictors? >> That would assume that it is less likely to be a false positive when two >> or more predictors suggest the same unsported model? >> >> Mikael >> >> >> 10 mar 2014 kl. 16:51 skrev Carson Holt : >> >>> Actually that is false. The keep_preds option is still binary. Any >>> value >>> other than 0 sets it to true. There was discussion about making it a >>> non-binary value, but that has not been implemented. >>> >>> ?Carson >>> >>> >>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>> >>>> Hi Mikael, >>>> >>>> The keep_preds parameter is often used the same as a binary parameter, >>>> but it doesn't have to be. The concordance that is mentioned in the >>>> comment line is the AED for that prediction. AED is a measurement of >>>> how >>>> well a prediction is supported by the evidence and ranges from 0 - 1. A >>>> prediction with an AED of 0 matches the evidence exactly while a >>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>> >>>> The default behavior for MAKER is to make a gene model out of a >>>> prediction with any AED <1. When you change the keep_preds option from >>>> 0 >>>> to 1, then MAKER will make a gene model out of any prediction that >>>> matches the other parameters (like single_exon, min_exon, etc). Setting >>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>> ceiling >>>> on the AED required for promoting a prediction to a gene model. >>>> >>>> From a user standpoint, when you will almost certainly lose gene models >>>> when you set AED at an intermediate value, but you might benefit by >>>> knowing that all your models will now have an AED of at least a certain >>>> value. >>>> >>>> I hope that helps; let me know if it didn't. >>>> >>>> ~Daniel >>>> >>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>> Genetics paper from 2012. >>>> >>>> Daniel Ence >>>> Graduate Student >>>> Eccles Institute of Human Genetics >>>> University of Utah >>>> 15 North 2030 East, Room 2100 >>>> Salt Lake City, UT 84112-5330 >>>> ________________________________________ >>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>> Sent: Monday, March 10, 2014 4:27 AM >>>> To: maker-devel at yandell-lab.org >>>> Subject: [maker-devel] keep_preds values >>>> >>>> Hi, >>>> >>>> Can someone, please, explain the keep_preds parameter, as it works now >>>> with a value between 1 and 0? It used to be binary, but now it seems to >>>> test concordance towards something. The maker wiki doesn?t explain it >>>> any >>>> further either. >>>> >>>> Thanks, >>>> Mikael >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > From carsonhh at gmail.com Mon Mar 10 10:16:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:16:59 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> References: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Message-ID: There is a value called abAED being calculated, which somewhat captures the concordance among the predictors. It is not currently printed in the GFF3, but it is used to identify the best non-overlapping ab initio predictor to put in the non-overlapping fasta file. There are a couple of things I still need to do with it to though. It?s not yet normalized to take into account the absence of a predictor in the cluster of overlapping predictions. For example, if I have 2 predictors and 2 make perfectly matching calls and 1 makes no call, they get a score of 0 before I have perfect concordance between what?s there, but I really should make it 0.33 because the abscence of the third predictor is meaningful. The unnormalized concordance value is fine for deciding which overlapping model to keep in the file, but not for global comparison. ?Carson On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" wrote: >Ok. But that is not implemented no as far as I can tell from the source, >right? Or is it reflected in the AED for the unsupported models? > >Mikael > >10 mar 2014 kl. 16:59 skrev Carson Holt : > >> Yes. It will eventually perform an AED like calculation between >>multiple >> predictors (i.e. if you use 3 predictors it, then you require support by >> at least 2 predictors across all exons to get a value of 0.33). A value >> of 0 would be perfect concordance across all 3 predictors. >> >> ?Carson >> >> >> >> >> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" >> wrote: >> >>> Hi Carson and Daniel, >>> >>> That sounds more logical to me. Then it would be appropriate to change >>> the comment of keep_preds in the generated config files. >>> >>> Would it make sense to make keep_preds a non-binary value to evaluate >>>the >>> concordance between ab initio models obtained from different >>>predictors? >>> That would assume that it is less likely to be a false positive when >>>two >>> or more predictors suggest the same unsported model? >>> >>> Mikael >>> >>> >>> 10 mar 2014 kl. 16:51 skrev Carson Holt : >>> >>>> Actually that is false. The keep_preds option is still binary. Any >>>> value >>>> other than 0 sets it to true. There was discussion about making it a >>>> non-binary value, but that has not been implemented. >>>> >>>> ?Carson >>>> >>>> >>>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>>> >>>>> Hi Mikael, >>>>> >>>>> The keep_preds parameter is often used the same as a binary >>>>>parameter, >>>>> but it doesn't have to be. The concordance that is mentioned in the >>>>> comment line is the AED for that prediction. AED is a measurement of >>>>> how >>>>> well a prediction is supported by the evidence and ranges from 0 - >>>>>1. A >>>>> prediction with an AED of 0 matches the evidence exactly while a >>>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>>> >>>>> The default behavior for MAKER is to make a gene model out of a >>>>> prediction with any AED <1. When you change the keep_preds option >>>>>from >>>>> 0 >>>>> to 1, then MAKER will make a gene model out of any prediction that >>>>> matches the other parameters (like single_exon, min_exon, etc). >>>>>Setting >>>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>>> ceiling >>>>> on the AED required for promoting a prediction to a gene model. >>>>> >>>>> From a user standpoint, when you will almost certainly lose gene >>>>>models >>>>> when you set AED at an intermediate value, but you might benefit by >>>>> knowing that all your models will now have an AED of at least a >>>>>certain >>>>> value. >>>>> >>>>> I hope that helps; let me know if it didn't. >>>>> >>>>> ~Daniel >>>>> >>>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>>> Genetics paper from 2012. >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>>> Sent: Monday, March 10, 2014 4:27 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] keep_preds values >>>>> >>>>> Hi, >>>>> >>>>> Can someone, please, explain the keep_preds parameter, as it works >>>>>now >>>>> with a value between 1 and 0? It used to be binary, but now it seems >>>>>to >>>>> test concordance towards something. The maker wiki doesn?t explain it >>>>> any >>>>> further either. >>>>> >>>>> Thanks, >>>>> Mikael >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>> >> >> > From carsonhh at gmail.com Mon Mar 10 10:18:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:18:14 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: References: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Message-ID: Sorry meant to say "3 predictors and 2 make perfectly matching calls and 1 makes no call." On 3/10/14, 9:16 AM, "Carson Holt" wrote: >There is a value called abAED being calculated, which somewhat captures >the concordance among the predictors. It is not currently printed in the >GFF3, but it is used to identify the best non-overlapping ab initio >predictor to put in the non-overlapping fasta file. There are a couple of >things I still need to do with it to though. It?s not yet normalized to >take into account the absence of a predictor in the cluster of overlapping >predictions. For example, if I have 2 predictors and 2 make perfectly >matching calls and 1 makes no call, they get a score of 0 before I have >perfect concordance between what?s there, but I really should make it 0.33 >because the abscence of the third predictor is meaningful. The >unnormalized concordance value is fine for deciding which overlapping >model to keep in the file, but not for global comparison. > >?Carson > > > >On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" >wrote: > >>Ok. But that is not implemented no as far as I can tell from the source, >>right? Or is it reflected in the AED for the unsupported models? >> >>Mikael >> >>10 mar 2014 kl. 16:59 skrev Carson Holt : >> >>> Yes. It will eventually perform an AED like calculation between >>>multiple >>> predictors (i.e. if you use 3 predictors it, then you require support >>>by >>> at least 2 predictors across all exons to get a value of 0.33). A >>>value >>> of 0 would be perfect concordance across all 3 predictors. >>> >>> ?Carson >>> >>> >>> >>> >>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" >>> >>> wrote: >>> >>>> Hi Carson and Daniel, >>>> >>>> That sounds more logical to me. Then it would be appropriate to >>>>change >>>> the comment of keep_preds in the generated config files. >>>> >>>> Would it make sense to make keep_preds a non-binary value to evaluate >>>>the >>>> concordance between ab initio models obtained from different >>>>predictors? >>>> That would assume that it is less likely to be a false positive when >>>>two >>>> or more predictors suggest the same unsported model? >>>> >>>> Mikael >>>> >>>> >>>> 10 mar 2014 kl. 16:51 skrev Carson Holt : >>>> >>>>> Actually that is false. The keep_preds option is still binary. Any >>>>> value >>>>> other than 0 sets it to true. There was discussion about making it a >>>>> non-binary value, but that has not been implemented. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>>>> >>>>>> Hi Mikael, >>>>>> >>>>>> The keep_preds parameter is often used the same as a binary >>>>>>parameter, >>>>>> but it doesn't have to be. The concordance that is mentioned in the >>>>>> comment line is the AED for that prediction. AED is a measurement of >>>>>> how >>>>>> well a prediction is supported by the evidence and ranges from 0 - >>>>>>1. A >>>>>> prediction with an AED of 0 matches the evidence exactly while a >>>>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>>>> >>>>>> The default behavior for MAKER is to make a gene model out of a >>>>>> prediction with any AED <1. When you change the keep_preds option >>>>>>from >>>>>> 0 >>>>>> to 1, then MAKER will make a gene model out of any prediction that >>>>>> matches the other parameters (like single_exon, min_exon, etc). >>>>>>Setting >>>>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>>>> ceiling >>>>>> on the AED required for promoting a prediction to a gene model. >>>>>> >>>>>> From a user standpoint, when you will almost certainly lose gene >>>>>>models >>>>>> when you set AED at an intermediate value, but you might benefit by >>>>>> knowing that all your models will now have an AED of at least a >>>>>>certain >>>>>> value. >>>>>> >>>>>> I hope that helps; let me know if it didn't. >>>>>> >>>>>> ~Daniel >>>>>> >>>>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>>>> Bioinformatics 2009. It's also discussed in more detail in the >>>>>>MAKER2 >>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>>>> Genetics paper from 2012. >>>>>> >>>>>> Daniel Ence >>>>>> Graduate Student >>>>>> Eccles Institute of Human Genetics >>>>>> University of Utah >>>>>> 15 North 2030 East, Room 2100 >>>>>> Salt Lake City, UT 84112-5330 >>>>>> ________________________________________ >>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>>>> Sent: Monday, March 10, 2014 4:27 AM >>>>>> To: maker-devel at yandell-lab.org >>>>>> Subject: [maker-devel] keep_preds values >>>>>> >>>>>> Hi, >>>>>> >>>>>> Can someone, please, explain the keep_preds parameter, as it works >>>>>>now >>>>>> with a value between 1 and 0? It used to be binary, but now it seems >>>>>>to >>>>>> test concordance towards something. The maker wiki doesn?t explain >>>>>>it >>>>>> any >>>>>> further either. >>>>>> >>>>>> Thanks, >>>>>> Mikael >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>> >>>>> >>>> >>> >>> >> > > From carsonhh at gmail.com Mon Mar 10 10:25:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:25:50 -0700 Subject: [maker-devel] annotation comparison aed plots Message-ID: I don?t know about Michaels?s script, but I?ve always used eval. It produces sensitivity/specificity metrics. It assumes the first models are 100% correct, and then tells you the sensitivity/specificity value for the second models. It is not therefor a quality metric. Instead you should view it as a change metric. Lower sensitivity tells you that models/exons have been lost between versions, and lower specificity tells you models/exons have been gained. There will also be a lost of generic statistics on exon/intron distribution and UTR length. Then the AED values from the MAEKR run can be used independently to evaluate how well models match the evidence. ?Carson From: "Robert King (RRes-Roth)" Date: Monday, March 10, 2014 at 5:17 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] annotation comparison aed plots Dear Maker Developers, I?ve updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I?ve produced two annotations using as evidence: Annotation 1: Uniprot proteins search using species keyword ?fusarium? Pubmed mRNA for the name of the organism Prior annotation reference transcripts Annotation 2: Uniprot proteins search using species keyword ?fusarium? Pubmed mRNA for the name of the organism Prior annotation reference transcripts mRNA trinity assembly pasafly of different strain (only RNA-seq available) I?m not sure if it was a smart move to use the prior annotation reference transcripts? I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I?ve got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I?m waiting on some perl modules to be installed by our administrator to test out the ?Evaluator? and ?compare? programs too, what do they do? Best Wishes Rob -- This message has been scanned for viruses and dangerous content by MailScanner , and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Mar 10 09:50:53 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 10 Mar 2014 09:50:53 -0600 Subject: [maker-devel] annotation comparison aed plots In-Reply-To: References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Message-ID: One more point. The sensitivity, specificity,and accuracy produced by the compare_annotations_3.2.pl script are gene level, and overlap is defined very liberally between annotation sets is defined as at least one nucleotide of an exon overlap. Mike On Mon, Mar 10, 2014 at 9:47 AM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Robert, > > Here are the scripts that were mentioned before. > > The AED_cdf_generator.pl script is for making cumulative distribution > function plots based on annotation edit distance. This script is quite > simple and strait forward in its internals. > > The compare_annotations_3.2.pl script is for generating summary stats for > annotations and will compare two annotations of the same assembly. > > You can run either script without arguments to get a usage statement. > > Thanks, > Mike > > > On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) < > robert.king at rothamsted.ac.uk> wrote: > >> Dear Maker Developers, >> >> >> >> I've updated a reference that was had errors and was a little incomplete >> and now trying to produce a annotation for it. Please note the reference >> has not changed dramatically. I've produced two annotations using as >> evidence: >> >> >> >> Annotation 1: >> >> Uniprot proteins search using species keyword "fusarium" >> >> Pubmed mRNA for the name of the organism >> >> Prior annotation reference transcripts >> >> >> >> Annotation 2: >> >> Uniprot proteins search using species keyword "fusarium" >> >> Pubmed mRNA for the name of the organism >> >> Prior annotation reference transcripts >> >> mRNA trinity assembly pasafly of different strain (only RNA-seq available) >> >> >> >> I'm not sure if it was a smart move to use the prior annotation reference >> transcripts? >> >> >> >> I want to compare these two annotations and have produced AED scores. How >> do I generate summary stats/figures to compare annotations. You mentioned >> last year in a post Mike Campbell has a script to produce these, do you >> know if he will post it? I've got the Eval program and converted to gtf >> format using the provided script, just waiting on some perl modules to be >> installed by admin to test it. I'm waiting on some perl modules to be >> installed by our administrator to test out the "Evaluator" and "compare" >> programs too, what do they do? >> >> >> >> Best Wishes >> >> Rob >> >> -- >> This message has been scanned for viruses and >> dangerous content by *MailScanner* , and >> we believe but do not warrant that this e-mail and any attachments >> thereto do not contain any viruses. However, you are fully responsible for >> performing any virus scanning. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Mon Mar 10 09:52:50 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 10 Mar 2014 15:52:50 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) Message-ID: I have been running MAKER 2.31 using Augustus and SNAP on an avian genome. Augustus gives pretty decent gene model predictions based on a custom model we have and the hints MAKER provides. However, SNAP seems to throw out a ton of false positives; in many cases this appears to cause erroneous gene fusions. Leaving out SNAP altogether however leads to a marked decrease in # models overall, which is worse. GeneMark had a very similar problem (high # false positives) and thus no marked improvement, either when using with both Augustus and SNAP or with Augustus alone. I have been exploring using geneid (http://genome.crg.es/software/geneid/) as an alternative, based on some feedback on another project I worked with int he past. This would be feed into MAKER using external GFF, but I wanted to see if anyone has tried geneid with MAKER first. Finally, how hard would it be to incorporate alternative callers into MAKER? For instance, would it be possible to add these like a ?plugin?? chris From carsonhh at gmail.com Mon Mar 10 11:05:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 10:05:24 -0700 Subject: [maker-devel] geneid (or alternative ab initio predictors) Message-ID: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Mon Mar 10 09:47:50 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 10 Mar 2014 09:47:50 -0600 Subject: [maker-devel] annotation comparison aed plots In-Reply-To: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Message-ID: Hi Robert, Here are the scripts that were mentioned before. The AED_cdf_generator.pl script is for making cumulative distribution function plots based on annotation edit distance. This script is quite simple and strait forward in its internals. The compare_annotations_3.2.pl script is for generating summary stats for annotations and will compare two annotations of the same assembly. You can run either script without arguments to get a usage statement. Thanks, Mike On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) < robert.king at rothamsted.ac.uk> wrote: > Dear Maker Developers, > > > > I've updated a reference that was had errors and was a little incomplete > and now trying to produce a annotation for it. Please note the reference > has not changed dramatically. I've produced two annotations using as > evidence: > > > > Annotation 1: > > Uniprot proteins search using species keyword "fusarium" > > Pubmed mRNA for the name of the organism > > Prior annotation reference transcripts > > > > Annotation 2: > > Uniprot proteins search using species keyword "fusarium" > > Pubmed mRNA for the name of the organism > > Prior annotation reference transcripts > > mRNA trinity assembly pasafly of different strain (only RNA-seq available) > > > > I'm not sure if it was a smart move to use the prior annotation reference > transcripts? > > > > I want to compare these two annotations and have produced AED scores. How > do I generate summary stats/figures to compare annotations. You mentioned > last year in a post Mike Campbell has a script to produce these, do you > know if he will post it? I've got the Eval program and converted to gtf > format using the provided script, just waiting on some perl modules to be > installed by admin to test it. I'm waiting on some perl modules to be > installed by our administrator to test out the "Evaluator" and "compare" > programs too, what do they do? > > > > Best Wishes > > Rob > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and > we believe but do not warrant that this e-mail and any attachments thereto > do not contain any viruses. However, you are fully responsible for > performing any virus scanning. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AED_cdf_generator.pl Type: text/x-perl-script Size: 2579 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: compare_annotations_3.2.pl Type: text/x-perl-script Size: 29154 bytes Desc: not available URL: From sajeet at gmail.com Mon Mar 10 12:31:40 2014 From: sajeet at gmail.com (Sajeet Haridas) Date: Mon, 10 Mar 2014 11:31:40 -0700 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt wrote: > Adding a new predictor can take some time. It obviously requires some > coding. It's usually not too hard just to convert results to GFF3 and > then pass it in. Integrated support is really only beneficial for > predictors that can take "hints" from evidence alignments (for example we > are working on EVM integration right now - > http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give > problems just drop them. GeneMark really doesn't work very good on > genomes with complex intron/exon structure (and I really wouldn't use it > for anything but fungi). > > Make sure you are also giving sufficient protein evidence. Perhaps all > proteins from chicken and pigeon for example. Then you shouldn't find > loss of any true genes if just using Augustus. Also try not to use gene > count as an indicator of performance. The value is very deceptive, > especially if the genome assembly is fragmented. > > Thanks, > Carson > > > > On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: > > >I have been running MAKER 2.31 using Augustus and SNAP on an avian > >genome. Augustus gives pretty decent gene model predictions based on a > >custom model we have and the hints MAKER provides. However, SNAP seems > >to throw out a ton of false positives; in many cases this appears to > >cause erroneous gene fusions. Leaving out SNAP altogether however leads > >to a marked decrease in # models overall, which is worse. GeneMark had a > >very similar problem (high # false positives) and thus no marked > >improvement, either when using with both Augustus and SNAP or with > >Augustus alone. > > > >I have been exploring using geneid > >(http://genome.crg.es/software/geneid/) as an alternative, based on some > >feedback on another project I worked with int he past. This would be > >feed into MAKER using external GFF, but I wanted to see if anyone has > >tried geneid with MAKER first. > > > >Finally, how hard would it be to incorporate alternative callers into > >MAKER? For instance, would it be possible to add these like a 'plugin'? > > > >chris > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 10 22:13:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 22:13:43 -0600 Subject: [maker-devel] Long introns from Augustus In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> References: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> Message-ID: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> Maybe. The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes. --Carson Sent from my iPhone > On Mar 6, 2014, at 5:48 PM, Shane Brubaker wrote: > > Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. > > I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. > > > -----Original Message----- > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Thursday, March 06, 2014 3:47 PM > To: Shane Brubaker; maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Long introns from Augustus > > Are these the ab intio calls that are merged or final MAKER models. > > ?Carson > > >> On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >> >> Hi, we have a very compact genome and we are getting a lot of fused >> gene models from running Augustus. I am wondering if anyone has any >> advice about how to prevent introns above a certain cutoff from being created? >> >> I tried a couple of things, some settings in a probabilities file and >> also changing a long list of probabilities to another file that someone >> had suggested on a forum. So far I don't really see any changes though. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From darasappan at gmail.com Mon Mar 10 14:14:03 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 10 Mar 2014 15:14:03 -0500 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing Message-ID: Hello, I've been running maker with different assembly files, reference files etc and I check the output by: 1. concatenating the gff files 2. concatenating the *transcripts.fasta files 3. concatenating the *proteins.fasta files I'm noticing that when I ran maker twice with same parameters, the second time around, many of the output subdirectories do not have a *transcripts.fasta or *proteins.fasta file in it. There are 251 subdirectories and only 97 of them have all 3 output files. Maker log looks ok to me, but I've attached it here as well. What could be the reason for this? Thanks dhivya -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.o1813247.gz Type: application/x-gzip Size: 13857217 bytes Desc: not available URL: -------------- next part -------------- From sbrubaker at solazyme.com Tue Mar 11 11:06:57 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Tue, 11 Mar 2014 17:06:57 +0000 Subject: [maker-devel] Long introns from Augustus In-Reply-To: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> References: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08FB3@EXCHANGE-MB01.internal.solazyme.com> Ok thank you. -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Monday, March 10, 2014 9:14 PM To: Shane Brubaker Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Long introns from Augustus Maybe. The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes. --Carson Sent from my iPhone > On Mar 6, 2014, at 5:48 PM, Shane Brubaker wrote: > > Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. > > I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. > > > -----Original Message----- > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Thursday, March 06, 2014 3:47 PM > To: Shane Brubaker; maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Long introns from Augustus > > Are these the ab intio calls that are merged or final MAKER models. > > ?Carson > > >> On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >> >> Hi, we have a very compact genome and we are getting a lot of fused >> gene models from running Augustus. I am wondering if anyone has any >> advice about how to prevent introns above a certain cutoff from being created? >> >> I tried a couple of things, some settings in a probabilities file and >> also changing a long list of probabilities to another file that >> someone had suggested on a forum. So far I don't really see any changes though. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >> rg > > From carson.holt at genetics.utah.edu Thu Mar 13 10:00:06 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 13 Mar 2014 16:00:06 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: Just resending this to the correct maker-devel address. Please when replying, do not CC the incorrect maker-devel-bounce address. Thanks, Carson On 3/13/14, 9:56 AM, "Carson Holt" wrote: >FGENESH is not a heavily used tool, so depending on which version it is >(either too old or too new), output might be slightly different which >could cause incorrect parsing. Could you tar up your maker.output folder, >and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >(send me either your user/guest ID after you upload). > >For the BLAST error, use BLAST+ instead. You are using blastall which is >the old legacy version of NCBI BLAST. You can do this by setting the >blast type in maker_bopts.ctl and the location of executables in >maker_exe.ctl. > >Thanks, >Carson > > > >On 3/12/14, 11:58 AM, "Borhan, Hossein" wrote: > >>Dear Maker users >> >> >>I ran maker (2.31) on a fungal genome and found out that it inserted the >>word SCLAR followed by a pair of bracket like this (0x22de7020) >>inserted in the nucleotide sequence of some of the genes. This seems to >>be related to transcripts predicted by fgenesh_masked. >> >> >>Here is an example for one of the genes >> >> >>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>offset:0 AE >>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>TTTCGACAAGC >> >>The same genome sequence was used for the first round of maker (2.10) >>without such problem. I checked the sequence for the scaffold related to >>one of the affected transcripts and there was no error in the sequence. >>I am not sure what is causing this. The only error that I could spot in >>the output error file is the following >> >> >>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>contexts/frames of query sequences. >> >> >> >>Your help is appreciated >> >> >> >>HB >> >> >> >> >> >> > From carsonhh at gmail.com Thu Mar 13 10:14:54 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 10:14:54 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> Message-ID: Note protein/transcript fasts are only created when there are gene models to output to those files (so their absence means there were no gene models for that contig). Most sequences without protein/transcript fasts in your sample are very short and thus don?t contain anything. What is left either have no est2genome results or the est2genome alignments do not have sufficient open reading frame to be turned into a gene model (false merging of regions by trinity can cause this, so make sure you use the jaccard index option when assembling reads with trinity to avoid this). You are using only the est2genome=1 option. This will result in a limited set of genes that can be used for training SNAP/Augustus (so not getting results on all contigs is expected). You really won?t get much as far as results until you have one of the ab initio predictors turned on. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 11, 2014 at 8:52 AM To: Carson Holt Cc: Daniel Ence Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing Alright done. My username is daras Thanks Dhivya On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > Input and compressed file of output. > > Thanks, > Carson > > From: dhivya arasappan > Date: Monday, March 10, 2014 at 2:09 PM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Hi Carson, > > Do you mean the whole maker output? > > Thanks > dhivya > > On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: > >> Could you upload everything here ?> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> Than send us the link generated or your user ID. >> >> Thanks, >> Carson >> >> >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 1:50 PM >> To: Carson Holt , Daniel Ence >> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson and Daniel, >> >> I'm sending this across to you separately since maker list is blocking my >> email due to attachment size. >> >> As always, thanks for any guidance you can provide. >> Dhivya >> >> >> Begin forwarded message: >> >>> From: dhivya arasappan >>> Date: March 10, 2014 3:14:03 PM CDT >>> To: maker-devel at yandell-lab.org >>> Subject: maker output- transcripts.fasta and proteins.fasta files missing >>> >>> >>> Hello, >>> >>> I've been running maker with different assembly files, reference files etc >>> and I check the output by: >>> >>> 1. concatenating the gff files >>> 2. concatenating the *transcripts.fasta files >>> 3. concatenating the *proteins.fasta files >>> >>> I'm noticing that when I ran maker twice with same parameters, the second >>> time around, many of the output subdirectories do not have a >>> *transcripts.fasta or *proteins.fasta file in it. >>> There are 251 subdirectories and only 97 of them have all 3 output files. >>> Maker log looks ok to me, but I've attached it here as well. >>> >>> What could be the reason for this? >>> >>> Thanks >>> dhivya >>> >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 13 10:55:40 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 10:55:40 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> Message-ID: The second time, it should have just started where it left off, so it would run faster (because the processing from the previous job counted towards the second one). The archived output you sent me had 21,183 proteins and transcripts. If you are using the fasta_merge to collect them, just make sure the datastore.index file is not truncated or corrupt otherwise it won?t collect all the fastas from every contig. You can rebuild the datastore.index using the -dsindex flag with MAKER, if you want to check that. Also you can have maker just regenerate results without rerunning BLAST etc., by using the -a flag if you want to just recalculate ll results quickly (rebuilds all FASTA and GFF3 without redoing most analysis). ?Carson From: dhivya arasappan Date: Thursday, March 13, 2014 at 10:47 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing Thanks Carson for the response. I understand that est2genome=1 does not use any ab initio gene predictions, but simply identifies ests based on alignment. I'm a little confused because I ran maker on my assembly before, using the same parameters ( including est2genome=1). I got a very good result with > 20,000 transcripts and proteins. Then I was able to get an improved assembly, where many scaffolds were combined into superscaffolds. So I reran maker on this assembly. Same parameters, same transcriptome and proteins files. Now, I see such drastically different results: Only 500+ genes and transcripts. My scaffolds are now bigger than before, so I'm not sure how this is happening. These were the results I sent you. Another odd thing I noticed (and I am hesitant to report this because perhaps it is due to some sort of error on my part): I ran maker on the improved assembly the first time and maker did not complete in the 48 hours I allocated. But I had 19,000+ transcripts in the unfinished output. When I reran maker, just changing the time allocated, it completed much faster, but is giving much fewer transcripts and proteins as output. Could something like this happen? If not, then I'm guessing I must have changed something although I'm pretty sure that I did not change anything other than the time allocated. I've attached the trascripts and proteins files from the first time I ran maker on my improved assembly. Thanks again for your help Dhivya On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: > Note protein/transcript fasts are only created when there are gene models to > output to those files (so their absence means there were no gene models for > that contig). Most sequences without protein/transcript fasts in your sample > are very short and thus don?t contain anything. What is left either have no > est2genome results or the est2genome alignments do not have sufficient open > reading frame to be turned into a gene model (false merging of regions by > trinity can cause this, so make sure you use the jaccard index option when > assembling reads with trinity to avoid this). > > You are using only the est2genome=1 option. This will result in a limited set > of genes that can be used for training SNAP/Augustus (so not getting results > on all contigs is expected). You really won?t get much as far as results > until you have one of the ab initio predictors turned on. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Tuesday, March 11, 2014 at 8:52 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Alright done. My username is daras > > Thanks > Dhivya > > On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > >> Input and compressed file of output. >> >> Thanks, >> Carson >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 2:09 PM >> To: Carson Holt >> Cc: Daniel Ence >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson, >> >> Do you mean the whole maker output? >> >> Thanks >> dhivya >> >> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >> >>> Could you upload everything here ?> >>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Than send us the link generated or your user ID. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: dhivya arasappan >>> Date: Monday, March 10, 2014 at 1:50 PM >>> To: Carson Holt , Daniel Ence >>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >>> missing >>> >>> Hi Carson and Daniel, >>> >>> I'm sending this across to you separately since maker list is blocking my >>> email due to attachment size. >>> >>> As always, thanks for any guidance you can provide. >>> Dhivya >>> >>> >>> Begin forwarded message: >>> >>>> From: dhivya arasappan >>>> Date: March 10, 2014 3:14:03 PM CDT >>>> To: maker-devel at yandell-lab.org >>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing >>>> >>>> >>>> Hello, >>>> >>>> I've been running maker with different assembly files, reference files etc >>>> and I check the output by: >>>> >>>> 1. concatenating the gff files >>>> 2. concatenating the *transcripts.fasta files >>>> 3. concatenating the *proteins.fasta files >>>> >>>> I'm noticing that when I ran maker twice with same parameters, the second >>>> time around, many of the output subdirectories do not have a >>>> *transcripts.fasta or *proteins.fasta file in it. >>>> There are 251 subdirectories and only 97 of them have all 3 output files. >>>> Maker log looks ok to me, but I've attached it here as well. >>>> >>>> What could be the reason for this? >>>> >>>> Thanks >>>> dhivya >>>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Mar 13 10:47:25 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 13 Mar 2014 11:47:25 -0500 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> Message-ID: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> Thanks Carson for the response. I understand that est2genome=1 does not use any ab initio gene predictions, but simply identifies ests based on alignment. I'm a little confused because I ran maker on my assembly before, using the same parameters ( including est2genome=1). I got a very good result with > 20,000 transcripts and proteins. Then I was able to get an improved assembly, where many scaffolds were combined into superscaffolds. So I reran maker on this assembly. Same parameters, same transcriptome and proteins files. Now, I see such drastically different results: Only 500+ genes and transcripts. My scaffolds are now bigger than before, so I'm not sure how this is happening. These were the results I sent you. Another odd thing I noticed (and I am hesitant to report this because perhaps it is due to some sort of error on my part): I ran maker on the improved assembly the first time and maker did not complete in the 48 hours I allocated. But I had 19,000+ transcripts in the unfinished output. When I reran maker, just changing the time allocated, it completed much faster, but is giving much fewer transcripts and proteins as output. Could something like this happen? If not, then I'm guessing I must have changed something although I'm pretty sure that I did not change anything other than the time allocated. I've attached the trascripts and proteins files from the first time I ran maker on my improved assembly. Thanks again for your help Dhivya On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: > Note protein/transcript fasts are only created when there are gene > models to output to those files (so their absence means there were > no gene models for that contig). Most sequences without protein/ > transcript fasts in your sample are very short and thus don?t > contain anything. What is left either have no est2genome results or > the est2genome alignments do not have sufficient open reading frame > to be turned into a gene model (false merging of regions by trinity > can cause this, so make sure you use the jaccard index option when > assembling reads with trinity to avoid this). > > You are using only the est2genome=1 option. This will result in a > limited set of genes that can be used for training SNAP/Augustus (so > not getting results on all contigs is expected). You really won?t > get much as far as results until you have one of the ab initio > predictors turned on. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Tuesday, March 11, 2014 at 8:52 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta > files missing > > Alright done. My username is daras > > Thanks > Dhivya > > On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > >> Input and compressed file of output. >> >> Thanks, >> Carson >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 2:09 PM >> To: Carson Holt >> Cc: Daniel Ence >> Subject: Re: maker output- transcripts.fasta and proteins.fasta >> files missing >> >> Hi Carson, >> >> Do you mean the whole maker output? >> >> Thanks >> dhivya >> >> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >> >>> Could you upload everything here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Than send us the link generated or your user ID. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: dhivya arasappan >>> Date: Monday, March 10, 2014 at 1:50 PM >>> To: Carson Holt , Daniel Ence >> > >>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta >>> files missing >>> >>> Hi Carson and Daniel, >>> >>> I'm sending this across to you separately since maker list is >>> blocking my email due to attachment size. >>> >>> As always, thanks for any guidance you can provide. >>> Dhivya >>> >>> >>> Begin forwarded message: >>> >>>> From: dhivya arasappan >>>> Date: March 10, 2014 3:14:03 PM CDT >>>> To: maker-devel at yandell-lab.org >>>> Subject: maker output- transcripts.fasta and proteins.fasta files >>>> missing >>>> >>>> Hello, >>>> >>>> I've been running maker with different assembly files, reference >>>> files etc and I check the output by: >>>> >>>> 1. concatenating the gff files >>>> 2. concatenating the *transcripts.fasta files >>>> 3. concatenating the *proteins.fasta files >>>> >>>> I'm noticing that when I ran maker twice with same parameters, >>>> the second time around, many of the output subdirectories do not >>>> have a *transcripts.fasta or *proteins.fasta file in it. >>>> There are 251 subdirectories and only 97 of them have all 3 >>>> output files. Maker log looks ok to me, but I've attached it >>>> here as well. >>>> >>>> What could be the reason for this? >>>> >>>> Thanks >>>> dhivya >>>> >>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: transcripts.cat.fasta.old.gz Type: application/x-gzip Size: 7927581 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proteins.cat.fasta.old.gz Type: application/x-gzip Size: 3668381 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 13 12:53:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 12:53:05 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> <672A27A2-FFBD-45EC-9303-E3973EEA5AB6@gmail.com> <5EE3B5E8-E7DC-4F09-B52D-E08CA4D85A15@gmail.com> Message-ID: For future reference, I suggest using the ?/maker/bin/fasta_merge tool to merge based on the datastore.index rather than other command line based methods. It will handle the multiple fasta types that are produced in the results, and will validate with the datastore.index file. Example: fasta_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log The same is also true when merging gff3 files. gff3_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log Thanks, Carson From: dhivya arasappan Date: Thursday, March 13, 2014 at 12:48 PM To: Carson Holt Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing ah I forgot that some were called superscaffolds. That is a difference between the old and new assembly. This was definitely the issue. Thanks and sorry for the mix up. Dhivya On Mar 13, 2014, at 12:51 PM, Carson Holt wrote: > Note that your command does not capture everything because not all scaffolds > start with the name ?scaffold". > > This works though ?> > ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/*/*trans*fasta|wc -l > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, March 13, 2014 at 11:34 AM > To: Carson Holt > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Hi Carson, > > Am I looking in the wrong place for my fasta files? I looked here: > > ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|wc > -l > > I see only 97 such files- so 97 contigs with transcripts.fasta files? > > When I count the number of sequences in all these files, I get 514 sequences. > > grep -c '^>' > opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|cut -d ':' > -f 2|awk '{total+=$0}END{print total}' > > Could you tell how and where you are getting the 21,183 transcripts? > > thanks > dhivya > > On Mar 13, 2014, at 12:21 PM, Carson Holt wrote: > >> This is what I see in your uploaded data. There are 21,183 transcripts from >> 201 contigs. Then there are 707 contigs with no gene models. >> >> ?Carson >> >> >> From: Carson Holt >> Date: Thursday, March 13, 2014 at 11:11 AM >> To: dhivya arasappan >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> "as you saw from the output I uploaded before, the output certainly was much >> less than 20,000 transcripts? >> >> Actually there were 21,183 in the output you uploaded. I saw no loss of >> entries. >> >> ?Carson >> >> From: dhivya arasappan >> Date: Thursday, March 13, 2014 at 11:09 AM >> To: Carson Holt >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson, >> >> The datastore.index file looks fine- it has a started and finished status for >> my 980 scaffolds. I reran with increased time twice. Second time around, I >> actually deleted the entire output directory to make sure it runs all over >> again. It still seemed to complete within a day. As you saw from the output >> I uploaded before, the output certainly was much less than 20,000 >> transcripts. Given that I was seeing great results for an older version of my >> assembly, I'm puzzled as to why my results are worse this time around. Any >> suggestions of what to check or what I can do to see improved results would >> be really helpful. >> >> I do know that I went from ~4% gaps to ~6% gaps in my new assembly- other >> than that, its better in every way. Could this cause just a dramatic >> difference in results? >> >> Thanks >> dhivya >> >> On Mar 13, 2014, at 11:55 AM, Carson Holt wrote: >> >>> The second time, it should have just started where it left off, so it would >>> run faster (because the processing from the previous job counted towards the >>> second one). The archived output you sent me had 21,183 proteins and >>> transcripts. If you are using the fasta_merge to collect them, just make >>> sure the datastore.index file is not truncated or corrupt otherwise it won?t >>> collect all the fastas from every contig. You can rebuild the >>> datastore.index using the -dsindex flag with MAKER, if you want to check >>> that. Also you can have maker just regenerate results without rerunning >>> BLAST etc., by using the -a flag if you want to just recalculate ll results >>> quickly (rebuilds all FASTA and GFF3 without redoing most analysis). >>> >>> ?Carson >>> >>> >>> From: dhivya arasappan >>> Date: Thursday, March 13, 2014 at 10:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >>> >>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>> missing >>> >>> Thanks Carson for the response. I understand that est2genome=1 does not use >>> any ab initio gene predictions, but simply identifies ests based on >>> alignment. I'm a little confused because I ran maker on my assembly before, >>> using the same parameters ( including est2genome=1). I got a very good >>> result with > 20,000 transcripts and proteins. >>> >>> Then I was able to get an improved assembly, where many scaffolds were >>> combined into superscaffolds. So I reran maker on this assembly. Same >>> parameters, same transcriptome and proteins files. Now, I see such >>> drastically different results: Only 500+ genes and transcripts. My >>> scaffolds are now bigger than before, so I'm not sure how this is happening. >>> These were the results I sent you. >>> >>> Another odd thing I noticed (and I am hesitant to report this because >>> perhaps it is due to some sort of error on my part): I ran maker on the >>> improved assembly the first time and maker did not complete in the 48 hours >>> I allocated. But I had 19,000+ transcripts in the unfinished output. When >>> I reran maker, just changing the time allocated, it completed much faster, >>> but is giving much fewer transcripts and proteins as output. Could >>> something like this happen? If not, then I'm guessing I must have changed >>> something although I'm pretty sure that I did not change anything other than >>> the time allocated. I've attached the trascripts and proteins files from the >>> first time I ran maker on my improved assembly. >>> >>> Thanks again for your help >>> Dhivya >>> >>> >>> >>> On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: >>> >>>> Note protein/transcript fasts are only created when there are gene models >>>> to output to those files (so their absence means there were no gene models >>>> for that contig). Most sequences without protein/transcript fasts in your >>>> sample are very short and thus don?t contain anything. What is left either >>>> have no est2genome results or the est2genome alignments do not have >>>> sufficient open reading frame to be turned into a gene model (false merging >>>> of regions by trinity can cause this, so make sure you use the jaccard >>>> index option when assembling reads with trinity to avoid this). >>>> >>>> You are using only the est2genome=1 option. This will result in a limited >>>> set of genes that can be used for training SNAP/Augustus (so not getting >>>> results on all contigs is expected). You really won?t get much as far as >>>> results until you have one of the ab initio predictors turned on. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> From: dhivya arasappan >>>> Date: Tuesday, March 11, 2014 at 8:52 AM >>>> To: Carson Holt >>>> Cc: Daniel Ence >>>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>>> missing >>>> >>>> Alright done. My username is daras >>>> >>>> Thanks >>>> Dhivya >>>> >>>> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: >>>> >>>>> Input and compressed file of output. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: dhivya arasappan >>>>> Date: Monday, March 10, 2014 at 2:09 PM >>>>> To: Carson Holt >>>>> Cc: Daniel Ence >>>>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>>>> missing >>>>> >>>>> Hi Carson, >>>>> >>>>> Do you mean the whole maker output? >>>>> >>>>> Thanks >>>>> dhivya >>>>> >>>>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >>>>> >>>>>> Could you upload everything here ?> >>>>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>>> >>>>>> Than send us the link generated or your user ID. >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> >>>>>> From: dhivya arasappan >>>>>> Date: Monday, March 10, 2014 at 1:50 PM >>>>>> To: Carson Holt , Daniel Ence >>>>>> >>>>>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >>>>>> missing >>>>>> >>>>>> Hi Carson and Daniel, >>>>>> >>>>>> I'm sending this across to you separately since maker list is blocking my >>>>>> email due to attachment size. >>>>>> >>>>>> As always, thanks for any guidance you can provide. >>>>>> Dhivya >>>>>> >>>>>> >>>>>> Begin forwarded message: >>>>>> >>>>>>> From: dhivya arasappan >>>>>>> Date: March 10, 2014 3:14:03 PM CDT >>>>>>> To: maker-devel at yandell-lab.org >>>>>>> Subject: maker output- transcripts.fasta and proteins.fasta files >>>>>>> missing >>>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I've been running maker with different assembly files, reference files >>>>>>> etc and I check the output by: >>>>>>> >>>>>>> 1. concatenating the gff files >>>>>>> 2. concatenating the *transcripts.fasta files >>>>>>> 3. concatenating the *proteins.fasta files >>>>>>> >>>>>>> I'm noticing that when I ran maker twice with same parameters, the >>>>>>> second time around, many of the output subdirectories do not have a >>>>>>> *transcripts.fasta or *proteins.fasta file in it. >>>>>>> There are 251 subdirectories and only 97 of them have all 3 output >>>>>>> files. Maker log looks ok to me, but I've attached it here as well. >>>>>>> >>>>>>> What could be the reason for this? >>>>>>> >>>>>>> Thanks >>>>>>> dhivya >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Mar 13 15:04:23 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 13 Mar 2014 21:04:23 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas > wrote: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt > wrote: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 14 10:06:26 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 14 Mar 2014 09:06:26 -0700 Subject: [maker-devel] associating gene names between related strains Message-ID: Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Mar 14 11:32:02 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 14 Mar 2014 17:32:02 +0000 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species. Let me know if that's not what you're looking for, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu] Sent: Friday, March 14, 2014 10:06 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] associating gene names between related strains Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 14 12:01:16 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 14 Mar 2014 11:01:16 -0700 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: I will try it today. Thanks for the quick reply! On Fri, Mar 14, 2014 at 10:32 AM, Daniel Ence wrote: > Hi Janna, So do you have one strain that you want to use as the > reference for all the others? There's a script that comes with MAKER called > maker_map_ids that lets you use a common prefix or suffix for entries in a > fasta file from one strain and then use est_forward to use that ID in the > gene models for the other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > Janna Fierst [jfierst at uoregon.edu] > *Sent:* Friday, March 14, 2014 10:06 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate > the gene naming so that orthologs between species can be associated by > name. I have been playing around a little with the est_forward option but > can't figure out a good system/workflow that preserves names but still uses > the strain-specific RNA-Seq EST set for the actual gene models. Thanks! > -Janna > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 12:02:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 12:02:48 -0600 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to know which genes you want to translate names to (two column input file, column 1 -> original ID, column 2 -> new ID). I?m not sure EST forward is the best way to do this, although I do think maker_map_ids is the tool to use in the end. The question is how to make a list of IDs to translate as the input to maker_map_ids? I would actually just use BLASTP against the reference strain, and then do reciprocal best BLAST hits. To do this you BLAST your reference proteins against your maker proteins. Then do the opposite, BLAST your maker proteins against your reference proteins. If they are both each others best hit, then they are orthologous, and you can safely make a two column entry for the maker_map_ids input (i.e. maker-gene-1 translates into smug1). ?Carson From: Daniel Ence Date: Friday, March 14, 2014 at 11:32 AM To: Janna Fierst , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] associating gene names between related strains Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species. Let me know if that's not what you're looking for, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu] Sent: Friday, March 14, 2014 10:06 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] associating gene names between related strains Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 12:43:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 12:43:41 -0600 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com> References: <3219E92A-2024-45C6-84A9-66C646287D7E@gmail.com> <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com> Message-ID: I?m glad you were able to fix it. I?ll check to see why it was failing as well. Thanks, Carson From: dhivya arasappan Date: Friday, March 14, 2014 at 10:16 AM To: Carson Holt Subject: Re: Error when running maker2zff script Kindly ignore my previous question. I was able to manipulate the scaffold names in the gff file to get maker2zff to work. Thanks dhivya On Mar 14, 2014, at 10:55 AM, dhivya arasappan wrote: > My message got flagged by the maker list again, so I?m forwarding this > separately to you. Is there a better way to send biggish files? > > > Thank you > Dhivya > > > > Begin forwarded message: > >> From: dhivya arasappan >> Subject: Error when running maker2zff script >> Date: March 13, 2014 at 8:35:27 PM CDT >> To: Carson Holt , maker-devel at yandell-lab.org >> >> Hi Carson, >> >> I used gff3_merge to create my gff file from maker output. I've attached it >> here. But when I run maker2zff on it, I get the following error: >> >> Can't use an undefined value as an ARRAY reference at >> /opt/apps/maker/2.30/bin/maker2zff line 177, line 7294251. >> >> It produces an incomplete output file and it looks like it may be running >> into problems when it encounters scaffold3%2F0. I'm wondering if its having >> problems with my scaffold names. There seem to be some inconsistencies >> because it's referred to as scaffold3%F0 and scaffold3/0 in the gff file. >> It goes through other scaffolds like SCAFFOLD3_873, SCAFFOLD3_95 etc just >> fine. I did try replacing the scaffold names in the gff file, but still get >> the same error. Any ideas? >> >> Substitution command I used, for your reference: sed 's/3\%2F/3_/g' gfffile| >> sed 's/\//\_/' > mod.gfffile >> >> Thanks >> Dhivya >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 13:25:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 13:25:58 -0600 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: We can look into it. ?Carson From: "Fields, Christopher J" Date: Thursday, March 13, 2014 at 3:04 PM To: Sajeet Haridas Cc: Carson Holt , " List" Subject: Re: [maker-devel] geneid (or alternative ab initio predictors) That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas wrote: > One of the problems I have found with genemark is that it does not understand > a soft-masked genome. Hence, the self training is incorrect. I have found > marked improvement to genemark's prediction by running the training on a hard > masked genome. > > > On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt wrote: >> Adding a new predictor can take some time. It obviously requires some >> coding. It?s usually not too hard just to convert results to GFF3 and >> then pass it in. Integrated support is really only beneficial for >> predictors that can take ?hints? from evidence alignments (for example we >> are working on EVM integration right now - >> http://evidencemodeler.sourceforge.net >> ). If SNAP and GeneMark give >> problems just drop them. GeneMark really doesn?t work very good on >> genomes with complex intron/exon structure (and I really wouldn?t use it >> for anything but fungi). >> >> Make sure you are also giving sufficient protein evidence. Perhaps all >> proteins from chicken and pigeon for example. Then you shouldn?t find >> loss of any true genes if just using Augustus. Also try not to use gene >> count as an indicator of performance. The value is very deceptive, >> especially if the genome assembly is fragmented. >> >> Thanks, >> Carson >> >> >> >> On 3/10/14, 8:52 AM, "Fields, Christopher J" wrote: >> >>> >I have been running MAKER 2.31 using Augustus and SNAP on an avian >>> >genome. Augustus gives pretty decent gene model predictions based on a >>> >custom model we have and the hints MAKER provides. However, SNAP seems >>> >to throw out a ton of false positives; in many cases this appears to >>> >cause erroneous gene fusions. Leaving out SNAP altogether however leads >>> >to a marked decrease in # models overall, which is worse. GeneMark had a >>> >very similar problem (high # false positives) and thus no marked >>> >improvement, either when using with both Augustus and SNAP or with >>> >Augustus alone. >>> > >>> >I have been exploring using geneid >>> >(http://genome.crg.es/software/geneid/) as an alternative, based on some >>> >feedback on another project I worked with int he past. This would be >>> >feed into MAKER using external GFF, but I wanted to see if anyone has >>> >tried geneid with MAKER first. >>> > >>> >Finally, how hard would it be to incorporate alternative callers into >>> >MAKER? For instance, would it be possible to add these like a ?plugin?? >>> > >>> >chris >>> >_______________________________________________ >>> >maker-devel mailing list >>> >maker-devel at box290.bluehost.com >>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri Mar 14 20:22:55 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sat, 15 Mar 2014 02:22:55 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: <53FD788A-15EA-4A18-BB2F-3072178816CA@illinois.edu> Not an issue at the moment; I?ll likely supply these via gff for now. If needed I can work off a svn checkout and send along a patch should I ever manage to eek out time to work on it. chris On Mar 14, 2014, at 2:25 PM, Carson Holt > wrote: We can look into it. ?Carson From: "Fields, Christopher J" > Date: Thursday, March 13, 2014 at 3:04 PM To: Sajeet Haridas > Cc: Carson Holt >, "> List" > Subject: Re: [maker-devel] geneid (or alternative ab initio predictors) That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas > wrote: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt > wrote: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Mon Mar 17 13:45:15 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 17 Mar 2014 19:45:15 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: I have attached 4 files for you to place in the .../maker/Widgets/ directory. The *blast.pm files will suppress the BLAST+ failures you are getting (alternatively you can just downgrade to BLAST 2.27 to get the same effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. In the latest release NCBI redid all their warnings and error codes so it spits out a lot of garbage and fails with different messages than it did before. For example BLAST now warns you every time it encounter a fasta header with a comment (virtually every fasta entry in existence falls in this category), so your screen will be awash with meaningless warning messages. The fgenesh.pm file will fix the other failure, which only occurs if you use fgenesh simultaneously with the est_fustion=1 option. No other predictors are affected. Thanks, Carson On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: >Dear Carson > >Sorry for the late reply. I was away for a couple of days. I have uploaded >the out put files plus control and error output on the FTP site that you >provided >The user ID is borhanh > >I used blast+ for this run. > > > > >Regards > > >HB > > > > > > > > >On 14-03-13 10:00 AM, "Carson Holt" wrote: > >>Just resending this to the correct maker-devel address. Please when >>replying, do not CC the incorrect maker-devel-bounce address. >> >>Thanks, >>Carson >> >> >>On 3/13/14, 9:56 AM, "Carson Holt" wrote: >> >>>FGENESH is not a heavily used tool, so depending on which version it is >>>(either too old or too new), output might be slightly different which >>>could cause incorrect parsing. Could you tar up your maker.output >>>folder, >>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>(send me either your user/guest ID after you upload). >>> >>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>is >>>the old legacy version of NCBI BLAST. You can do this by setting the >>>blast type in maker_bopts.ctl and the location of executables in >>>maker_exe.ctl. >>> >>>Thanks, >>>Carson >>> >>> >>> >>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Dear Maker users >>>> >>>> >>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>the >>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>inserted in the nucleotide sequence of some of the genes. This seems to >>>>be related to transcripts predicted by fgenesh_masked. >>>> >>>> >>>>Here is an example for one of the genes >>>> >>>> >>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>offset:0 AE >>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>TTTCGACAAGC >>>> >>>>The same genome sequence was used for the first round of maker (2.10) >>>>without such problem. I checked the sequence for the scaffold related >>>>to >>>>one of the affected transcripts and there was no error in the sequence. >>>>I am not sure what is causing this. The only error that I could spot in >>>>the output error file is the following >>>> >>>> >>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>contexts/frames of query sequences. >>>> >>>> >>>> >>>>Your help is appreciated >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: blastn.pm Type: text/x-perl-script Size: 8112 bytes Desc: blastn.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastx.pm Type: text/x-perl-script Size: 8218 bytes Desc: blastx.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fgenesh.pm Type: text/x-perl-script Size: 19744 bytes Desc: fgenesh.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tblastx.pm Type: text/x-perl-script Size: 9113 bytes Desc: tblastx.pm URL: From carsonhh at gmail.com Mon Mar 17 15:14:42 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Mar 2014 15:14:42 -0600 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: References: Message-ID: Just an update on this. I?ve fixed the maker2zff script to handle the issues seen. Looking at this actually brought to light another issue. There is inconsistent escape character specification for GFF3 in column 1 (the source ID), column 8 (the attributes ID and Target_ID), as well as the FASTA ID for internal sequence. We?re updating the GFF3 spec to clarify this so that everywhere you see the same ID getting treated the same way for character escaping. To be safe though, only use these characters in your contig IDs for the assembly when using any tool that reads or outputs GFF3 ?> a-zA-Z0-9.:^*$@!+_?-| Any character not in that set has a high chance of breaking some downstream tool. For now just assume the strict interpretation from the GFF3 spec for column 1, must be used on all IDs everywhere (see below). >>Column 1: ?seqid" >>The ID of the landmark used to establish the coordinate system for the >>current feature. >>IDs may contain any characters, but must escape any characters not in >>the set [a-zA-Z0-9.:^*$@!+_?-|]. >>In particular, IDs may not contain unescaped whitespace and must not >>begin with an unescaped ">". Thanks, Carson On 3/13/14, 7:35 PM, "dhivya arasappan" wrote: >Hi Carson, > >I used gff3_merge to create my gff file from maker output. I've >attached it here. But when I run maker2zff on it, I get the following >error: > >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/ >2.30/bin/maker2zff line 177, line 7294251. > >It produces an incomplete output file and it looks like it may be >running into problems when it encounters scaffold3%2F0. I'm wondering >if its having problems with my scaffold names. There seem to be some >inconsistencies because it's referred to as scaffold3%F0 and >scaffold3/0 in the gff file. It goes through other scaffolds like >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine. I did try replacing the >scaffold names in the gff file, but still get the same error. Any >ideas? > >Substitution command I used, for your reference: sed 's/3\%2F/3_/g' >gfffile| sed 's/\//\_/' > mod.gfffile > >Thanks >Dhivya > From darasappan at gmail.com Mon Mar 17 15:20:18 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 17 Mar 2014 16:20:18 -0500 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: References: Message-ID: Awesome! Thanks Carson. Dhivya On Mon, Mar 17, 2014 at 4:14 PM, Carson Holt wrote: > Just an update on this. I've fixed the maker2zff script to handle the > issues seen. Looking at this actually brought to light another issue. > There is inconsistent escape character specification for GFF3 in column 1 > (the source ID), column 8 (the attributes ID and Target_ID), as well as > the FASTA ID for internal sequence. We're updating the GFF3 spec to > clarify this so that everywhere you see the same ID getting treated the > same way for character escaping. > > To be safe though, only use these characters in your contig IDs for the > assembly when using any tool that reads or outputs GFF3 --> > a-zA-Z0-9.:^*$@!+_?-| > > Any character not in that set has a high chance of breaking some > downstream tool. For now just assume the strict interpretation from the > GFF3 spec for column 1, must be used on all IDs everywhere (see below). > > >>Column 1: "seqid" > >>The ID of the landmark used to establish the coordinate system for the > >>current feature. > >>IDs may contain any characters, but must escape any characters not in > >>the set [a-zA-Z0-9.:^*$@!+_?-|]. > >>In particular, IDs may not contain unescaped whitespace and must not > >>begin with an unescaped ">". > > > Thanks, > Carson > > > > On 3/13/14, 7:35 PM, "dhivya arasappan" wrote: > > >Hi Carson, > > > >I used gff3_merge to create my gff file from maker output. I've > >attached it here. But when I run maker2zff on it, I get the following > >error: > > > >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/ > >2.30/bin/maker2zff line 177, line 7294251. > > > >It produces an incomplete output file and it looks like it may be > >running into problems when it encounters scaffold3%2F0. I'm wondering > >if its having problems with my scaffold names. There seem to be some > >inconsistencies because it's referred to as scaffold3%F0 and > >scaffold3/0 in the gff file. It goes through other scaffolds like > >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine. I did try replacing the > >scaffold names in the gff file, but still get the same error. Any > >ideas? > > > >Substitution command I used, for your reference: sed 's/3\%2F/3_/g' > >gfffile| sed 's/\//\_/' > mod.gfffile > > > >Thanks > >Dhivya > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at bils.se Tue Mar 18 05:43:43 2014 From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Tue, 18 Mar 2014 12:43:43 +0100 Subject: [maker-devel] Maker changes 2.30-2.31 Message-ID: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> Hi, I have observed a few oddities with our installation of maker 2.31 and was therefore wondering if there is a change log somewhere to get some information on what, if anything, was changed between 2.30 and 2.31? There is of course a good chance that the issues I am seeing (pipeline locking up) are related to our setup and not necessarily Maker - but I?d like to make sure, if possible. Both versions use the exact same external binaries etc, and were run on the same data. 2.30 is running along happily, 2.31 however has randomly locked up. I should perhaps also say that I am running on SL 6.2 and am using mpich2 for the MPI run. I haven?t done any more systematic testing so far, but will probably do so if there is no ?obvious? reason why Maker 2.31 should behave differently.. Cheers, Marc Marc P. Hoeppner, PhD Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at bils.se From carsonhh at gmail.com Tue Mar 18 09:07:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 09:07:07 -0600 Subject: [maker-devel] Maker changes 2.30-2.31 In-Reply-To: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> References: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> Message-ID: Attached. Also make sure you are using the tar ball from the lab website and not the prerelease from the subversion repository. Thanks, Carson On 3/18/14, 5:43 AM, "Marc H?ppner" wrote: >Hi, > >I have observed a few oddities with our installation of maker 2.31 and >was therefore wondering if there is a change log somewhere to get some >information on what, if anything, was changed between 2.30 and 2.31? > >There is of course a good chance that the issues I am seeing (pipeline >locking up) are related to our setup and not necessarily Maker - but I?d >like to make sure, if possible. Both versions use the exact same external >binaries etc, and were run on the same data. 2.30 is running along >happily, 2.31 however has randomly locked up. I should perhaps also say >that I am running on SL 6.2 and am using mpich2 for the MPI run. > >I haven?t done any more systematic testing so far, but will probably do >so if there is no ?obvious? reason why Maker 2.31 should behave >differently.. > >Cheers, > >Marc > > > > >Marc P. Hoeppner, PhD >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at bils.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: svn_log.txt URL: From fbarreto at ucsd.edu Tue Mar 18 10:08:47 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Tue, 18 Mar 2014 09:08:47 -0700 Subject: [maker-devel] Size of initial EST training set for SNAP Message-ID: Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 10:14:29 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 10:14:29 -0600 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: That sounds good. 1,500 initial models should be more than sufficient for the first round of training. ?Carson From: Felipe Barreto Date: Tuesday, March 18, 2014 at 10:08 AM To: MAKER group Subject: [maker-devel] Size of initial EST training set for SNAP Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Mar 18 10:16:20 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 18 Mar 2014 16:16:20 +0000 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: Hi Felipe, I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training. The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence. Good luck and let us know if there's anything we can help with! Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu] Sent: Tuesday, March 18, 2014 10:08 AM To: MAKER group Subject: [maker-devel] Size of initial EST training set for SNAP Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue Mar 18 10:26:45 2014 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 18 Mar 2014 10:26:45 -0600 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> Hi Felipe, I think that plan sounds quite reasonable. To address your primary concern, most gene prediction tools recommend something in the range of a minimum of a few hundred gene models to train on. Since your an order of magnitude above that I think your in good shape. Having said that, of course if you have concerns about biases in your training set you may be able to supplement it further by using a tool like CEGMA (http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence genes that your set is missing. Since the final gene set will only be as complete as the gene predictions that MAKER has to choose from I would suggest that you also consider including at least one other gene predictor. Augustus works well on a wide variety of genomes and while it is more difficult to train than SNAP it does accept hints from MAKER and will likely add to the diversity of the final gene set, even if you choose to use an existing HMM that has some reasonable relationship to your genome. This is one of the advantages of MAKER supervision, while it would be best to train Augustus as well, MAKER will ensure that the final models are not too far out of line with the evidence and you'll likely see quite good results using a custom SNAP HMM and an existing Augustus HMM as predictor within MAKER. Thanks, B On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote: > Hi, all, > > I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. > > From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). > > I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). > > Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. > > Thanks for any insight! > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Tue Mar 18 10:59:39 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Tue, 18 Mar 2014 09:59:39 -0700 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> References: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> Message-ID: Thanks, guys, for the swift and informative response! I will try to train Augustus again, but at the very least, will include it with an arthropod HMM in my final run (in addition to my custom SNAP HMM). Cheers, Felipe On Tue, Mar 18, 2014 at 9:26 AM, Barry Moore wrote: > Hi Felipe, > > I think that plan sounds quite reasonable. To address your primary > concern, most gene prediction tools recommend something in the range of a > minimum of a few hundred gene models to train on. Since your an order of > magnitude above that I think your in good shape. Having said that, of > course if you have concerns about biases in your training set you may be > able to supplement it further by using a tool like CEGMA ( > http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence > genes that your set is missing. > > Since the final gene set will only be as complete as the gene predictions > that MAKER has to choose from I would suggest that you also consider > including at least one other gene predictor. Augustus works well on a wide > variety of genomes and while it is more difficult to train than SNAP it > does accept hints from MAKER and will likely add to the diversity of the > final gene set, even if you choose to use an existing HMM that has some > reasonable relationship to your genome. This is one of the advantages of > MAKER supervision, while it would be best to train Augustus as well, MAKER > will ensure that the final models are not too far out of line with the > evidence and you'll likely see quite good results using a custom SNAP HMM > and an existing Augustus HMM as predictor within MAKER. > > Thanks, > > B > > On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote: > > Hi, all, > > I've been learning a lot from reading posts from this group, and finally > started doing actual runs of Maker on our current genome assembly > (arthropod, genome size ~230Mb). I started by training SNAP, but would > like to check my approach before continuing with longer runs. > > From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I > deemed of very high quality based on blast alignments to Swiss-Prot (based > on query-subject coverage, bit score, etc). I then used only these 2000 > ESTs in a first Maker run using est2genome=1. The output returned 1500 > models (with the 500 "missing" models probably a result of single-exon > issues; not a concern at this point). > > I now plan on training SNAP with this first output, and then doing another > Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein > evidence, and 3) SNAP with the first HMM file. The output of this second > run will be used to re-train SNAP, and this second HMM file will be used in > a final "official" run (while continuing to provide the EST and protein > evidence, of course). > > Does this sound like a reasonable approach? Simply put, my main concern > is whether I'm using too few ESTs in my first est2genome step. > > Thanks for any insight! > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 18 13:27:11 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 18 Mar 2014 14:27:11 -0500 Subject: [maker-devel] maker snap output files Message-ID: Hello, I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. *maker.proteins.fasta *maker.snap_masked.proteins.fasta *maker.non_overlapping_ab_initio.proteins.fasta What is the difference among these? They all have different number of sequences. Similarly,with transcripts: maker.non_overlapping_ab_initio.transcripts.fasta maker.snap_masked.transcripts.fasta maker.transcripts.fasta Thanks Dhivya -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 13:34:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 13:34:05 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: References: Message-ID: maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 18, 2014 at 1:27 PM To: Carson Holt , Subject: maker snap output files Hello, I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. *maker.proteins.fasta *maker.snap_masked.proteins.fasta *maker.non_overlapping_ab_initio.proteins.fasta What is the difference among these? They all have different number of sequences. Similarly,with transcripts: maker.non_overlapping_ab_initio.transcripts.fasta maker.snap_masked.transcripts.fasta maker.transcripts.fasta Thanks Dhivya -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 18 14:05:39 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 18 Mar 2014 15:05:39 -0500 Subject: [maker-devel] maker snap output files In-Reply-To: References: Message-ID: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Thanks Carson. Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? The unfiltered proteins file has more proteins though. Thanks Dhivya On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) > maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) > maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. > > > These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. > > Thanks, > Carson > > > > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 1:27 PM > To: Carson Holt , > Subject: maker snap output files > > Hello, > > I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. > > *maker.proteins.fasta > *maker.snap_masked.proteins.fasta > *maker.non_overlapping_ab_initio.proteins.fasta > > What is the difference among these? They all have different number of sequences. > > Similarly,with transcripts: > > maker.non_overlapping_ab_initio.transcripts.fasta > maker.snap_masked.transcripts.fasta > maker.transcripts.fasta > > Thanks > Dhivya > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 14:09:01 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 14:09:01 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Message-ID: There can also be hint based predictions. They may be similar in size, but there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold). You should always review your annotations in something like Apollo, to see how the models compare to the evidence. Just counts don?t really mean anything. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 18, 2014 at 2:05 PM To: Carson Holt Cc: Subject: Re: maker snap output files Thanks Carson. Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? The unfiltered proteins file has more proteins though. Thanks Dhivya On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > maker.proteins.fasta - these are the final filtered and modified protein > models (this is what you want) > maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio > predictions (for reference purposes) > maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant > rejected models that do not overlap the maker.proteins.fasta entries. If you > think you are missing a gene, look for it here. Sometimes people use > interproscan (very slow) to analyze this file for false negatives. > > > These files are also described in the README distributed with MAKER in the > ?MAKER OUTPUT? section. > > Thanks, > Carson > > > > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 1:27 PM > To: Carson Holt , > Subject: maker snap output files > > Hello, > > I ran maker after running SNAP ab initio prediction (following instructions > from the maker tutorial). It ran successfully and when I ran fasta_merge, I > got several output fasta files. I?m unable to find information on the tutorial > about interpreting these different files. I?m hoping one of you can help. > > *maker.proteins.fasta > *maker.snap_masked.proteins.fasta > *maker.non_overlapping_ab_initio.proteins.fasta > > What is the difference among these? They all have different number of > sequences. > > Similarly,with transcripts: > > maker.non_overlapping_ab_initio.transcripts.fasta > maker.snap_masked.transcripts.fasta > maker.transcripts.fasta > > Thanks > Dhivya > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisbioinfo at gmail.com Wed Mar 19 05:09:57 2014 From: chrisbioinfo at gmail.com (Chris Bioinfo) Date: Wed, 19 Mar 2014 12:09:57 +0100 Subject: [maker-devel] Annotation with maker2 Message-ID: Hello, I'm installing/using maker2 for the first time and I have an error by using it. I certainly missing something, but I don't know what. I compile maker with no error message and I have all these directories after compilation: bin data GMOD INSTALL lib LICENSE MWAS perl README src Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have this error: STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm Can't call method "do" on an undefined value at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm --> rank=NA, hostname=belem ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 ... ideas? Best, Christelle -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 19 07:01:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Mar 2014 07:01:35 -0600 Subject: [maker-devel] Annotation with maker2 In-Reply-To: References: Message-ID: Your problem is one of the following. You need to reinstall the DBD::SQLite module, you are running in a directory you don?t have permissions for, you set your TMDIR environmental variable or TMP value in maker_opts.ctl to an NFS mounted or memory mounted directory, or you are using a self compiled version of Perl (I.e. not /usr/bin/perl) that has issues (probably with DB or SQLite modules). You can also completely delete the output directory, and start again to see if it was just a random error. You should look at each of those first. You can also run MAKER with the --debug command line flag and send it to me if all of those seem not to be the issue. Thanks, Carson From: Chris Bioinfo Date: Wednesday, March 19, 2014 at 5:09 AM To: Subject: [maker-devel] Annotation with maker2 Hello, I'm installing/using maker2 for the first time and I have an error by using it. I certainly missing something, but I don't know what. I compile maker with no error message and I have all these directories after compilation: bin data GMOD INSTALL lib LICENSE MWAS perl README src Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have this error: STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm Can't call method "do" on an undefined value at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm --> rank=NA, hostname=belem ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 ... ideas? Best, Christelle _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbharris at uw.edu Wed Mar 19 19:19:27 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Wed, 19 Mar 2014 18:19:27 -0700 Subject: [maker-devel] tradeoff between run time & file number Message-ID: Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Mar 19 23:43:11 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 20 Mar 2014 05:43:11 +0000 Subject: [maker-devel] tradeoff between run time & file number In-Reply-To: References: Message-ID: Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end. Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp. Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu] Sent: Wednesday, March 19, 2014 7:19 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] tradeoff between run time & file number Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Mar 20 11:22:47 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 20 Mar 2014 12:22:47 -0500 Subject: [maker-devel] maker snap output files In-Reply-To: References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Message-ID: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> Hi Carson, Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file? The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene? so does this mean these are the genes from the snap prediction tool? Thanks dhivya On Mar 18, 2014, at 3:09 PM, Carson Holt wrote: > There can also be hint based predictions. They may be similar in size, but there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold). You should always review your annotations in something like Apollo, to see how the models compare to the evidence. Just counts don?t really mean anything. > > Thanks, > Carson > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 2:05 PM > To: Carson Holt > Cc: > Subject: Re: maker snap output files > > Thanks Carson. > > Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? > > The unfiltered proteins file has more proteins though. > > Thanks > Dhivya > > > > On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > >> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) >> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) >> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. >> >> >> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. >> >> Thanks, >> Carson >> >> >> >> >> From: dhivya arasappan >> Date: Tuesday, March 18, 2014 at 1:27 PM >> To: Carson Holt , >> Subject: maker snap output files >> >> Hello, >> >> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. >> >> *maker.proteins.fasta >> *maker.snap_masked.proteins.fasta >> *maker.non_overlapping_ab_initio.proteins.fasta >> >> What is the difference among these? They all have different number of sequences. >> >> Similarly,with transcripts: >> >> maker.non_overlapping_ab_initio.transcripts.fasta >> maker.snap_masked.transcripts.fasta >> maker.transcripts.fasta >> >> Thanks >> Dhivya >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 20 11:24:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 20 Mar 2014 11:24:41 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> Message-ID: maker transcripts will be the gene/mRNA/exon/CDS features All other transcripts from SNAP etc. will be match/match_part features in the GFF3. When you look at these in something like Apollo, they will be placed in different viewing panels based on their type. Thanks, Carson From: dhivya arasappan Date: Thursday, March 20, 2014 at 11:22 AM To: Carson Holt Cc: Subject: Re: maker snap output files Hi Carson, Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file? The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene? so does this mean these are the genes from the snap prediction tool? Thanks dhivya On Mar 18, 2014, at 3:09 PM, Carson Holt wrote: > There can also be hint based predictions. They may be similar in size, but > there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, > as gene predictors tend to over predict (as much as 10 fold). You should > always review your annotations in something like Apollo, to see how the models > compare to the evidence. Just counts don?t really mean anything. > > Thanks, > Carson > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 2:05 PM > To: Carson Holt > Cc: > Subject: Re: maker snap output files > > Thanks Carson. > > Is it normal that in my maker results after running snap, the number of > proteins (in *maker.proteins.fasta) Is actually less than the number of > proteins in my pre-snap maker results? I assumed that annotations through > alignment+annotation through prediction would equal more annotations? > > The unfiltered proteins file has more proteins though. > > Thanks > Dhivya > > > > On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > >> maker.proteins.fasta - these are the final filtered and modified protein >> models (this is what you want) >> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab >> initio predictions (for reference purposes) >> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant >> rejected models that do not overlap the maker.proteins.fasta entries. If you >> think you are missing a gene, look for it here. Sometimes people use >> interproscan (very slow) to analyze this file for false negatives. >> >> >> These files are also described in the README distributed with MAKER in the >> ?MAKER OUTPUT? section. >> >> Thanks, >> Carson >> >> >> >> >> From: dhivya arasappan >> Date: Tuesday, March 18, 2014 at 1:27 PM >> To: Carson Holt , >> Subject: maker snap output files >> >> Hello, >> >> I ran maker after running SNAP ab initio prediction (following instructions >> from the maker tutorial). It ran successfully and when I ran fasta_merge, I >> got several output fasta files. I?m unable to find information on the >> tutorial about interpreting these different files. I?m hoping one of you can >> help. >> >> *maker.proteins.fasta >> *maker.snap_masked.proteins.fasta >> *maker.non_overlapping_ab_initio.proteins.fasta >> >> What is the difference among these? They all have different number of >> sequences. >> >> Similarly,with transcripts: >> >> maker.non_overlapping_ab_initio.transcripts.fasta >> maker.snap_masked.transcripts.fasta >> maker.transcripts.fasta >> >> Thanks >> Dhivya >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 20 11:53:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 20 Mar 2014 11:53:24 -0600 Subject: [maker-devel] tradeoff between run time & file number In-Reply-To: References: Message-ID: You may also want to try the GFF3 pass_through options. Basically you give your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from your past run by setting the 'pass' options to 1. Then you can run without your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat masker species as well). The values will be passed forward from the GFF3 file into the current run. --Carson From: Daniel Ence Date: Wednesday, March 19, 2014 at 11:43 PM To: Rebecca Harris , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] tradeoff between run time & file number Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end. Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp. Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu] Sent: Wednesday, March 19, 2014 7:19 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] tradeoff between run time & file number Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 21 08:23:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 08:23:18 -0600 Subject: [maker-devel] Annotation with maker2 In-Reply-To: References: Message-ID: Glad it's working. Let us know if anything else comes up. --Carson From: Chris Bioinfo Date: Friday, March 21, 2014 at 4:57 AM To: Carson Holt Subject: Re: [maker-devel] Annotation with maker2 Dear Carson it works!! after many difficults : I have installed sqlite3.8.4.1 yesterday: it was """better"""" (no error message by launching sqlite3). Yet my test.db was not created.. Today I find the trick! the problem was due to my too long path to created the db .. only that... Thanks for your time and you help Carson! All the best, Christelle 2014-03-20 18:21 GMT+01:00 Carson Holt : > Also you can use this command line to test both before and after installing > > perl -MDBI -MDBD::SQLite -e 'print "$DBD::SQLite::sqlite_version\n"; $dbh = > DBI->connect("dbi:SQLite:dbname=/path/from/maker/error/dpp_contig.db","","");' > > Make sure to set /path/from/maker/error/dpp_contig.db to whatever its was in > the error. > > --Carson > > > From: Carson Holt > Date: Thursday, March 20, 2014 at 11:03 AM > To: Chris Bioinfo > > Subject: Re: [maker-devel] Annotation with maker2 > > The failure is in SQLite. So you have to reinstall. I.e. 'force install > DBD::SQLite' in CPAN. Otherwise you are just keeping whatever module is > installed which may have broken C bindings. > > You may also have to install SQLite 3.8.4.1, and then reinstall the perl > modules using the force option to force recompile. > > --Carson > > > > From: Chris Bioinfo > Date: Thursday, March 20, 2014 at 10:57 AM > To: Carson Holt > Subject: Re: [maker-devel] Annotation with maker2 > > cpan[2]> install DBI > DBI is up to date (1.631). > > cpan[3]> install DBD::SQLite > DBD::SQLite is up to date (1.42). > > my test.db is not created effectively: > > sqlite3 dpp_contig.maker.output/test.db > SQLite version 3.8.3.1 2014-02-11 14:52:19 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> > > > > > 2014-03-20 17:36 GMT+01:00 Carson Holt : >> I'm actually checking the mount points for the disk. SQLite won't work on >> filesystems that don't implement locks, and 'df' is a good way to infer some >> of that info. >> >> Basically I still think this is SQLlite failing on your system. You might >> need to reinstall SQLlite and then reinstall the perl DBI and DBD::SQLite >> modules. >> >> You can also do a test command --> 'sqllite3 dpp_contig.maker.output/test.db' >> >> This will work if you have sqllite3 installed. And any error it give may be >> informative. >> >> --Carson >> >> From: Chris Bioinfo >> Date: Thursday, March 20, 2014 at 10:29 AM >> >> To: Carson Holt >> Subject: Re: [maker-devel] Annotation with maker2 >> >> oh sorry >> >> my disks are quite full, but still space I guess for maker >> >> /dev/sdc1 19T 18T 934G 95% /home >> >> >> 2014-03-20 17:23 GMT+01:00 Chris Bioinfo : >>> this : >>> >>> du -h dpp_contig.maker.output/ >>> 0 >>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi >>> d.contig-dpp-500-500/0 >>> 88K >>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi >>> d.contig-dpp-500-500 >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500 >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05/1F >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05 >>> 92K dpp_contig.maker.output/dpp_contig_datastore >>> 4.0K dpp_contig.maker.output/dpp_contig_master_datastore_index.log >>> 4.0K dpp_contig.maker.output/maker_bopts.log >>> 4.0K dpp_contig.maker.output/maker_exe.log >>> 8.0K dpp_contig.maker.output/maker_opts.log >>> 16K dpp_contig.maker.output/mpi_blastdb/dpp_protein%2Efasta.mpi.1 >>> 44K dpp_contig.maker.output/mpi_blastdb/dpp_contig%2Efasta.mpi.1 >>> 14M dpp_contig.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10 >>> 32K dpp_contig.maker.output/mpi_blastdb/dpp_est%2Efasta.mpi.1 >>> 14M dpp_contig.maker.output/mpi_blastdb >>> 0 dpp_contig.maker.output/seen.dbm >>> >>> >>> >>> 2014-03-20 17:10 GMT+01:00 Carson Holt : >>> >>>> What does 'df -h dpp_contig.maker.output' show? >>>> >>>> --Carson >>>> >>>> From: Chris Bioinfo >>>> Date: Thursday, March 20, 2014 at 10:00 AM >>>> >>>> To: Carson Holt >>>> Subject: Re: [maker-devel] Annotation with maker2 >>>> >>>> sorry, mistake on the dir! >>>> >>>> I have these files: >>>> dpp_contig_datastore dpp_contig_master_datastore_index.log >>>> maker_bopts.log maker_exe.log maker_opts.log mpi_blastdb seen.dbm >>>> >>>> >>>> 2014-03-20 16:59 GMT+01:00 Chris Bioinfo : >>>>> no, >>>>> >>>>> I have theses files in the directory: >>>>> dpp_contig.fasta dpp_est.fasta hsap_contig.fasta >>>>> hsap_protein.fasta maker_exe.ctl >>>>> dpp_contig.maker.output dpp_protein.fasta hsap_est.fasta >>>>> maker_bopts.ctl maker_opts.ctl te_proteins.fasta >>>>> >>>>> >>>>> >>>>> 2014-03-20 16:53 GMT+01:00 Carson Holt : >>>>> >>>>>> Did >>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker. >>>>>> output/dpp_contig.db exist? >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> From: Chris Bioinfo >>>>>> Date: Thursday, March 20, 2014 at 9:50 AM >>>>>> >>>>>> To: Carson Holt >>>>>> Subject: Re: [maker-devel] Annotation with maker2 >>>>>> >>>>>> cdantec at belem:~$ /usr/bin/perl -v >>>>>> >>>>>> This is perl 5, version 18, subversion 1 (v5.18.1) built for >>>>>> x86_64-linux-gnu-thread-multi >>>>>> (with 46 registered patches, see perl -V for more detail) >>>>>> >>>>>> Copyright 1987-2013, Larry Wall >>>>>> >>>>>> Perl may be copied only under the terms of either the Artistic License or >>>>>> the >>>>>> GNU General Public License, which may be found in the Perl 5 source kit. >>>>>> >>>>>> Complete documentation for Perl, including FAQ lists, should be found on >>>>>> this system using "man perl" or "perldoc perl". If you have access to >>>>>> the >>>>>> Internet, point your browser at http://www.perl.org/, the Perl Home Page. >>>>>> >>>>>> >>>>>> >>>>>> 2014-03-20 16:32 GMT+01:00 Carson Holt : >>>>>>> What do you get for when you type --> /usr/bin/perl -v >>>>>>> >>>>>>> The key to the error is this line --> >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file >>>>>>> >>>>>>> Either the database doesn't exist, or is corrupt. Does it exist? >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> From: Chris Bioinfo >>>>>>> Date: Thursday, March 20, 2014 at 9:25 AM >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Annotation with maker2 >>>>>>> >>>>>>> Dear Carson, >>>>>>> >>>>>>> I have reinstalled DBD::SQLite module, check the permission in my >>>>>>> directory, configure the TMP value in maker_opts.ctl. perl is in >>>>>>> /usr/bin/perl. >>>>>>> I have deleted many times the output directory.. but same problem.. >>>>>>> >>>>>>> So here the debug output : >>>>>>> ****MODULE VERSION INFO >>>>>>> 0.05 Acme::Damn /usr/local/lib/perl/5.18.1/Acme/Damn.pm >>>>>>> 1.01 AnyDBM_File /usr/share/perl/5.18/AnyDBM_File.pm >>>>>>> 5.73 AutoLoader /usr/share/perl/5.18/AutoLoader.pm >>>>>>> UNKNOWN Bio::AnalysisParserI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnalysisParserI.pm >>>>>>> UNKNOWN Bio::AnnotatableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotatableI.pm >>>>>>> UNKNOWN Bio::Annotation::Collection >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/Collection.pm >>>>>>> UNKNOWN Bio::Annotation::SimpleValue >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/SimpleValue.pm >>>>>>> UNKNOWN Bio::Annotation::TypeManager >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/TypeManager.pm >>>>>>> UNKNOWN Bio::AnnotationCollectionI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationCollectionI.pm >>>>>>> UNKNOWN Bio::AnnotationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationI.pm >>>>>>> 1.006923 Bio::DB::Fasta >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/Fasta.pm >>>>>>> UNKNOWN Bio::DB::InMemoryCache >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/InMemoryCache.pm >>>>>>> UNKNOWN Bio::DB::IndexedBase >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/IndexedBase.pm >>>>>>> UNKNOWN Bio::DB::RandomAccessI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/RandomAccessI.pm >>>>>>> UNKNOWN Bio::DB::SeqI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/SeqI.pm >>>>>>> UNKNOWN Bio::DescribableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DescribableI.pm >>>>>>> UNKNOWN Bio::Event::EventGeneratorI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventGeneratorI.pm >>>>>>> UNKNOWN Bio::Event::EventHandlerI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventHandlerI.pm >>>>>>> UNKNOWN Bio::Factory::ObjectFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactory.pm >>>>>>> UNKNOWN Bio::Factory::ObjectFactoryI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactoryI.pm >>>>>>> UNKNOWN Bio::Factory::SequenceFactoryI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/SequenceFactoryI.pm >>>>>>> UNKNOWN Bio::FeatureHolderI >>>>>>> /usr/local/share/perl/5.18.1/Bio/FeatureHolderI.pm >>>>>>> UNKNOWN Bio::IdentifiableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/IdentifiableI.pm >>>>>>> UNKNOWN Bio::LocatableSeq >>>>>>> /usr/local/share/perl/5.18.1/Bio/LocatableSeq.pm >>>>>>> UNKNOWN Bio::Location::Atomic >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Atomic.pm >>>>>>> UNKNOWN Bio::Location::CoordinatePolicyI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/CoordinatePolicyI.pm >>>>>>> UNKNOWN Bio::Location::Fuzzy >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Fuzzy.pm >>>>>>> UNKNOWN Bio::Location::FuzzyLocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/FuzzyLocationI.pm >>>>>>> UNKNOWN Bio::Location::Simple >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Simple.pm >>>>>>> UNKNOWN Bio::Location::Split >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Split.pm >>>>>>> UNKNOWN Bio::Location::SplitLocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/SplitLocationI.pm >>>>>>> UNKNOWN Bio::Location::WidestCoordPolicy >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/WidestCoordPolicy.pm >>>>>>> UNKNOWN Bio::LocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/LocationI.pm >>>>>>> UNKNOWN Bio::PrimarySeq >>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeq.pm >>>>>>> 1.006923 Bio::PrimarySeqI >>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeqI.pm >>>>>>> UNKNOWN Bio::Range /usr/local/share/perl/5.18.1/Bio/Range.pm >>>>>>> UNKNOWN Bio::RangeI /usr/local/share/perl/5.18.1/Bio/RangeI.pm >>>>>>> 1.006923 Bio::Root::Exception >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Exception.pm >>>>>>> UNKNOWN Bio::Root::HTTPget >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/HTTPget.pm >>>>>>> UNKNOWN Bio::Root::IO >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/IO.pm >>>>>>> 1.006923 Bio::Root::Root >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Root.pm >>>>>>> 1.006923 Bio::Root::RootI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/RootI.pm >>>>>>> 1.006923 Bio::Root::Version >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Version.pm >>>>>>> UNKNOWN Bio::Search::HSP::GenericHSP >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/GenericHSP.pm >>>>>>> UNKNOWN Bio::Search::HSP::HSPFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPFactory.pm >>>>>>> UNKNOWN Bio::Search::HSP::HSPI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPI.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/Base.p>>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/august >>>>>>> us.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastn >>>>>>> .pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastx >>>>>>> .pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/cdna2g >>>>>>> enome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/est2ge >>>>>>> nome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/fgenes >>>>>>> h.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/genema >>>>>>> rk.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::gff3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/gff3.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/protei >>>>>>> n2genome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::repeatmasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/repeat >>>>>>> masker.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snap.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snosca >>>>>>> n.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/tblast >>>>>>> x.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/trnasc >>>>>>> an.pm >>>>>>> 1.006923 Bio::Search::Hit::GenericHit >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/GenericHit.pm >>>>>>> UNKNOWN Bio::Search::Hit::HitFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitFactory.pm >>>>>>> UNKNOWN Bio::Search::Hit::HitI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitI.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/Base.p>>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/august >>>>>>> us.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastn >>>>>>> .pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastx >>>>>>> .pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/cdna2g >>>>>>> enome.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/est2ge >>>>>>> nome.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/fgenes >>>>>>> h.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/genema >>>>>>> rk.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::gff3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/gff3.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/protei >>>>>>> n2genome.pm >>>>>>> 1.006923 Bio::Search::Hit::PhatHit::repeatmasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/repeat >>>>>>> masker.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snap.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snosca >>>>>>> n.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/tblast >>>>>>> x.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/trnasc >>>>>>> an.pm >>>>>>> 1.006923 Bio::Search::SearchUtils >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/SearchUtils.pm >>>>>>> UNKNOWN Bio::SearchIO >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO.pm >>>>>>> UNKNOWN Bio::SearchIO::EventHandlerI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/EventHandlerI.pm >>>>>>> UNKNOWN Bio::SearchIO::SearchResultEventBuilder >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/SearchResultEventBuilder.pm >>>>>>> UNKNOWN Bio::Seq /usr/local/share/perl/5.18.1/Bio/Seq.pm >>>>>>> UNKNOWN Bio::Seq::SeqFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Seq/SeqFactory.pm >>>>>>> UNKNOWN Bio::SeqAnalysisParserI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqAnalysisParserI.pm >>>>>>> UNKNOWN Bio::SeqFeature::FeaturePair >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/FeaturePair.pm >>>>>>> UNKNOWN Bio::SeqFeature::Generic >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Generic.pm >>>>>>> UNKNOWN Bio::SeqFeature::Similarity >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Similarity.pm >>>>>>> UNKNOWN Bio::SeqFeature::SimilarityPair >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/SimilarityPair.pm >>>>>>> UNKNOWN Bio::SeqFeatureI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeatureI.pm >>>>>>> UNKNOWN Bio::SeqI /usr/local/share/perl/5.18.1/Bio/SeqI.pm >>>>>>> UNKNOWN Bio::SeqUtils >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqUtils.pm >>>>>>> 1.006923 Bio::Tools::CodonTable >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/CodonTable.pm >>>>>>> UNKNOWN Bio::Tools::GFF >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/GFF.pm >>>>>>> 1.006923 Bio::Tools::IUPAC >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/IUPAC.pm >>>>>>> 7.3 Bit::Vector /usr/local/lib/perl/5.18.1/Bit/Vector.pm >>>>>>> 0.01 CGL::Annotation >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation.pm >>>>>>> 0.01 CGL::Annotation::Feature >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature.pm >>>>>>> 0.01 CGL::Annotation::Feature::Contig >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Contig >>>>>>> .pm >>>>>>> 0.01 CGL::Annotation::Feature::Exon >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Exon.p>>>>>>> m >>>>>>> 0.01 CGL::Annotation::Feature::Gene >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Gene.p>>>>>>> m >>>>>>> 0.01 CGL::Annotation::Feature::Intron >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Intron >>>>>>> .pm >>>>>>> 0.01 CGL::Annotation::Feature::Protein >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Protei >>>>>>> n.pm >>>>>>> 0.01 CGL::Annotation::Feature::Sequence_variant >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Sequen >>>>>>> ce_variant.pm >>>>>>> 0.01 CGL::Annotation::Feature::Transcript >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Transc >>>>>>> ript.pm >>>>>>> 0.01 CGL::Annotation::FeatureLocation >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureLocatio >>>>>>> n.pm >>>>>>> 0.01 CGL::Annotation::FeatureRelationship >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureRelatio >>>>>>> nship.pm >>>>>>> 0.01 CGL::Annotation::Iterator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Iterator.pm >>>>>>> 0.01 CGL::Annotation::Trace >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Trace.pm >>>>>>> 0.01 CGL::Clone >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Clone.pm >>>>>>> 0.01 CGL::Ontology::Node >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Node.pm >>>>>>> 0.01 CGL::Ontology::NodeRelationship >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/NodeRelationship >>>>>>> .pm >>>>>>> 0.01 CGL::Ontology::Ontology >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Ontology.pm >>>>>>> 0.01 CGL::Ontology::Parser::OBO >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Parser/OBO.pm >>>>>>> 0.01 CGL::Ontology::SO >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/SO.pm >>>>>>> 0.01 CGL::Ontology::Trace >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Trace.pm >>>>>>> 0.01 CGL::Revcomp >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Revcomp.pm >>>>>>> 0.01 CGL::TranslationMachine >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/TranslationMachine.pm >>>>>>> 1.32 Carp /usr/local/share/perl/5.18.1/Carp.pm >>>>>>> 1.32 Carp::Heavy /usr/local/share/perl/5.18.1/Carp/Heavy.pm >>>>>>> 0.64 Class::Struct /usr/share/perl/5.18/Class/Struct.pm >>>>>>> 0.36 Clone /usr/local/lib/perl/5.18.1/Clone.pm >>>>>>> 5.018001 Config /usr/lib/perl/5.18/Config.pm >>>>>>> 3.40 Cwd /usr/lib/perl/5.18/Cwd.pm >>>>>>> 1.42 DBD::SQLite /usr/local/lib/perl/5.18.1/DBD/SQLite.pm >>>>>>> 1.631 DBI /usr/local/lib/perl/5.18.1/DBI.pm >>>>>>> 1.827 DB_File /usr/lib/perl/5.18/DB_File.pm >>>>>>> 2.145 Data::Dumper /usr/lib/perl/5.18/Data/Dumper.pm >>>>>>> 0.11 Datastore::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/Base.pm >>>>>>> 0.01 Datastore::MD5 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/MD5.pm >>>>>>> 2.53 Digest::MD5 /usr/local/lib/perl/5.18.1/Digest/MD5.pm >>>>>>> 1.16 Digest::base /usr/share/perl/5.18/Digest/base.pm >>>>>>> >>>>>>> UNKNOWN Dumper::GFF::GFFV3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/GFF/GFFV3.pm >>>>>>> UNKNOWN Dumper::XML::Game >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game.pm >>>>>>> UNKNOWN Dumper::XML::Game_Xml >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game_Xml.pm >>>>>>> 1.18 DynaLoader /usr/lib/perl/5.18/DynaLoader.pm >>>>>>> 1.18 Errno /usr/lib/perl/5.18/Errno.pm >>>>>>> 0.17015 Error >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm >>>>>>> UNKNOWN Error::Simple >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error/Simple.pm >>>>>>> 5.68 Exporter /usr/share/perl/5.18/Exporter.pm >>>>>>> 5.68 Exporter::Heavy /usr/share/perl/5.18/Exporter/Heavy.pm >>>>>>> UNKNOWN Fasta >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Fasta.pm >>>>>>> UNKNOWN FastaChunk >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunk.pm >>>>>>> UNKNOWN FastaChunker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunker.pm >>>>>>> UNKNOWN FastaDB >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaDB.pm >>>>>>> UNKNOWN FastaFile >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaFile.pm >>>>>>> UNKNOWN FastaSeq >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaSeq.pm >>>>>>> 1.11 Fcntl /usr/lib/perl/5.18/Fcntl.pm >>>>>>> 2.84 File::Basename /usr/share/perl/5.18/File/Basename.pm >>>>>>> 2.26 File::Copy /usr/share/perl/5.18/File/Copy.pm >>>>>>> 1.20 File::Glob /usr/lib/perl/5.18/File/Glob.pm >>>>>>> 1.20 File::NFSLock >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/NFSLock.pm >>>>>>> 2.09 File::Path /usr/share/perl/5.18/File/Path.pm >>>>>>> 3.40 File::Spec /usr/lib/perl/5.18/File/Spec.pm >>>>>>> 3.40 File::Spec::Unix /usr/lib/perl/5.18/File/Spec/Unix.pm >>>>>>> 0.2304 File::Temp /usr/local/share/perl/5.18.1/File/Temp.pm >>>>>>> 1.09 File::Which >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/Which.pm >>>>>>> 2.02 FileHandle /usr/share/perl/5.18/FileHandle.pm >>>>>>> 1.51 FindBin /usr/share/perl/5.18/FindBin.pm >>>>>>> UNKNOWN GFFDB >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> UNKNOWN GI /usr/local/annotation/maker2.31/bin/../lib/GI.pm >>>>>>> 2.42 Getopt::Long /usr/local/share/perl/5.18.1/Getopt/Long.pm >>>>>>> 6.02 HTTP::Date /usr/share/perl5/HTTP/Date.pm >>>>>>> 6.05 HTTP::Headers /usr/share/perl5/HTTP/Headers.pm >>>>>>> 6.06 HTTP::Message /usr/share/perl5/HTTP/Message.pm >>>>>>> 6.00 HTTP::Request /usr/share/perl5/HTTP/Request.pm >>>>>>> 6.04 HTTP::Response /usr/share/perl5/HTTP/Response.pm >>>>>>> 6.03 HTTP::Status /usr/share/perl5/HTTP/Status.pm >>>>>>> 1.28 IO /usr/lib/perl/5.18/IO.pm >>>>>>> 1.16 IO::File /usr/lib/perl/5.18/IO/File.pm >>>>>>> 1.34 IO::Handle /usr/lib/perl/5.18/IO/Handle.pm >>>>>>> 1.1 IO::Seekable /usr/lib/perl/5.18/IO/Seekable.pm >>>>>>> 1.21 IO::Select /usr/lib/perl/5.18/IO/Select.pm >>>>>>> 1.36 IO::Socket /usr/lib/perl/5.18/IO/Socket.pm >>>>>>> 1.33 IO::Socket::INET /usr/lib/perl/5.18/IO/Socket/INET.pm >>>>>>> 1.24 IO::Socket::UNIX /usr/lib/perl/5.18/IO/Socket/UNIX.pm >>>>>>> 1.13 IPC::Open3 /usr/share/perl/5.18/IPC/Open3.pm >>>>>>> 0.53 Inline /usr/local/share/perl/5.18.1/Inline.pm >>>>>>> UNKNOWN Inline::denter >>>>>>> /usr/local/share/perl/5.18.1/Inline/denter.pm >>>>>>> UNKNOWN Iterator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator.pm >>>>>>> UNKNOWN Iterator::Any >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Any.pm >>>>>>> UNKNOWN Iterator::Fasta >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Fasta.pm >>>>>>> UNKNOWN Iterator::GFF3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/GFF3.pm >>>>>>> 6.05 LWP /usr/share/perl5/LWP.pm >>>>>>> UNKNOWN LWP::MemberMixin /usr/share/perl5/LWP/MemberMixin.pm >>>>>>> 6.00 LWP::Protocol /usr/share/perl5/LWP/Protocol.pm >>>>>>> 6.05 LWP::UserAgent /usr/share/perl5/LWP/UserAgent.pm >>>>>>> 0.33 List::MoreUtils >>>>>>> /usr/local/lib/perl/5.18.1/List/MoreUtils.pm >>>>>>> 1.38 List::Util /usr/local/lib/perl/5.18.1/List/Util.pm >>>>>>> UNKNOWN MAKER::ConfigData >>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/MAKER/ConfigData.pm >>>>>>> 1.32 POSIX /usr/lib/perl/5.18/POSIX.pm >>>>>>> 0.01 Parallel::Application::MPI >>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/Parallel/Application/MPI >>>>>>> .pm >>>>>>> 0.02 Perl::Unsafe::Signals >>>>>>> /usr/local/lib/perl/5.18.1/Perl/Unsafe/Signals.pm >>>>>>> UNKNOWN PhatHit_utils >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PhatHit_utils.pm >>>>>>> UNKNOWN PostData >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PostData.pm >>>>>>> 1.0 Proc::ProcessTable_simple >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/ProcessTable_simple.pm >>>>>>> 1.0 Proc::Signal >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/Signal.pm >>>>>>> UNKNOWN Process::MpiChunk >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm >>>>>>> UNKNOWN Process::MpiTiers >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiTiers.pm >>>>>>> 1.38 Scalar::Util /usr/local/lib/perl/5.18.1/Scalar/Util.pm >>>>>>> 1.02 SelectSaver /usr/share/perl/5.18/SelectSaver.pm >>>>>>> UNKNOWN Shadower >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Shadower.pm >>>>>>> UNKNOWN SimpleCluster >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SimpleCluster.pm >>>>>>> 2.009 Socket /usr/lib/perl/5.18/Socket.pm >>>>>>> UNKNOWN SpaceBase >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SpaceBase.pm >>>>>>> 2.45 Storable /usr/local/lib/perl/5.18.1/Storable.pm >>>>>>> 1.07 Symbol /usr/share/perl/5.18/Symbol.pm >>>>>>> 1.17 Sys::Hostname /usr/lib/perl/5.18/Sys/Hostname.pm >>>>>>> 0.21 Sys::SigAction >>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction.pm >>>>>>> UNKNOWN Sys::SigAction::Alarm >>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction/Alarm.pm >>>>>>> 4.02 Term::ANSIColor /usr/share/perl/5.18/Term/ANSIColor.pm >>>>>>> 4.2 Tie::Handle /usr/share/perl/5.18/Tie/Handle.pm >>>>>>> 1.04 Tie::Hash /usr/share/perl/5.18/Tie/Hash.pm >>>>>>> 4.3 Tie::StdHandle /usr/share/perl/5.18/Tie/StdHandle.pm >>>>>>> 1.9726 Time::HiRes /usr/local/lib/perl/5.18.1/Time/HiRes.pm >>>>>>> 1.2300 Time::Local /usr/share/perl/5.18/Time/Local.pm >>>>>>> 1.60 URI /usr/share/perl5/URI.pm >>>>>>> 3.31 URI::Escape /usr/share/perl5/URI/Escape.pm >>>>>>> UNKNOWN Widget >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget.pm >>>>>>> UNKNOWN Widget::RepeatMasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/RepeatMasker.pm >>>>>>> UNKNOWN Widget::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/augustus.pm >>>>>>> >>>>>>> UNKNOWN Widget::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastn.pm >>>>>>> >>>>>>> UNKNOWN Widget::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastx.pm >>>>>>> >>>>>>> UNKNOWN Widget::exonerate >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate.pm >>>>>>> >>>>>>> UNKNOWN Widget::exonerate::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/cdna2genome. >>>>>>> pm >>>>>>> UNKNOWN Widget::exonerate::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/est2genome.p >>>>>>> m >>>>>>> UNKNOWN Widget::exonerate::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/protein2geno >>>>>>> me.pm >>>>>>> UNKNOWN Widget::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/fgenesh.pm >>>>>>> >>>>>>> UNKNOWN Widget::formater >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/formater.pm >>>>>>> >>>>>>> UNKNOWN Widget::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/genemark.pm >>>>>>> >>>>>>> UNKNOWN Widget::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snap.pm >>>>>>> >>>>>>> UNKNOWN Widget::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snoscan.pm >>>>>>> >>>>>>> UNKNOWN Widget::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/tblastx.pm >>>>>>> >>>>>>> UNKNOWN Widget::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/trnascan.pm >>>>>>> >>>>>>> 0.16 XSLoader /usr/share/perl/5.18/XSLoader.pm >>>>>>> 0.21 attributes /usr/lib/perl/5.18/attributes.pm >>>>>>> >>>>>>> 2.18 base /usr/share/perl/5.18/base.pm >>>>>>> 1.04 bytes /usr/share/perl/5.18/bytes.pm >>>>>>> UNKNOWN clean >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/clean.pm >>>>>>> UNKNOWN cluster >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/cluster.pm >>>>>>> >>>>>>> UNKNOWN compare >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/compare.pm >>>>>>> >>>>>>> 1.27 constant /usr/share/perl/5.18/constant.pm >>>>>>> >>>>>>> UNKNOWN ds_utility >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/ds_utility.pm >>>>>>> >>>>>>> UNKNOWN exonerate::splice_info >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/exonerate/splice_info.pm >>>>>>> >>>>>>> 0.34 forks /usr/local/lib/perl/5.18.1/forks.pm >>>>>>> >>>>>>> 2.08001 forks::Devel::Symdump >>>>>>> /usr/local/lib/perl/5.18.1/forks/Devel/Symdump.pm >>>>>>> 0.34 forks::shared /usr/local/lib/perl/5.18.1/forks/shared.pm >>>>>>> >>>>>>> 0.34 forks::signals >>>>>>> /usr/local/lib/perl/5.18.1/forks/signals.pm >>>>>>> 1.00 integer /usr/share/perl/5.18/integer.pm >>>>>>> >>>>>>> 0.63 lib /usr/lib/perl/5.18/lib.pm >>>>>>> 1.02 locale /usr/share/perl/5.18/locale.pm >>>>>>> UNKNOWN maker::auto_annotator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/auto_annotator.pm >>>>>>> >>>>>>> UNKNOWN maker::join >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/join.pm >>>>>>> >>>>>>> UNKNOWN maker::quality_index >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/quality_index.pm >>>>>>> >>>>>>> UNKNOWN maker::sens_spec >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/sens_spec.pm >>>>>>> >>>>>>> 1.22 overload /usr/share/perl/5.18/overload.pm >>>>>>> >>>>>>> 0.02 overloading /usr/share/perl/5.18/overloading.pm >>>>>>> >>>>>>> 0.225 parent /usr/share/perl/5.18/parent.pm >>>>>>> UNKNOWN polisher >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::altest >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/altest.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::est >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/est.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::protein >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/protein.pm >>>>>>> >>>>>>> UNKNOWN repeat_mask_seq >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/repeat_mask_seq.pm >>>>>>> >>>>>>> 0.1 runlog >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/runlog.pm >>>>>>> UNKNOWN shadow_AED >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/shadow_AED.pm >>>>>>> 1.07 sigtrap /usr/share/perl/5.18/sigtrap.pm >>>>>>> >>>>>>> 1.07 strict /usr/share/perl/5.18/strict.pm >>>>>>> 1.77 threads /usr/local/lib/perl/5.18.1/forks.pm >>>>>>> >>>>>>> 1.33 threads::shared >>>>>>> /usr/local/lib/perl/5.18.1/forks/shared.pm >>>>>>> 1.03 vars /usr/share/perl/5.18/vars.pm >>>>>>> 1.18 warnings /usr/share/perl/5.18/warnings.pm >>>>>>> >>>>>>> 1.02 warnings::register >>>>>>> /usr/share/perl/5.18/warnings/register.pm >>>>>>> STATUS: Parsing control files... >>>>>>> Calling GI::load_control_files at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 452. >>>>>>> Calling GI::new_instance_temp at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 463. >>>>>>> Calling GI::mount_check at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 465. >>>>>>> Calling GI::set_global_temp at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 483. >>>>>>> STATUS: Processing and indexing input FASTA files... >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling List::Util::shuffle at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 529. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::create_blastdb at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 574. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 622. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 623. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> STATUS: Setting up database for any GFF3 input... >>>>>>> Calling GFFDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 629. >>>>>>> Calling GFFDB::next_build at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 631. >>>>>>> Calling ds_utility::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 635. >>>>>>> A data structure will be created for you at: >>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker >>>>>>> .output/dpp_contig_datastore >>>>>>> >>>>>>> To access files for individual sequences use the datastore index: >>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker >>>>>>> .output/dpp_contig_master_datastore_index.log >>>>>>> >>>>>>> Calling Datastore::MD5::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 636. >>>>>>> Calling Iterator::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 639. >>>>>>> Calling Iterator::Fasta::skip_file at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 641. >>>>>>> Calling Iterator::Fasta::step at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 643. >>>>>>> STATUS: Now running MAKER... >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now starting the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Calling FastaDB::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> line 107. >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108. >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> Processing run.log file... >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now retrying the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> Tries: 2!! >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Calling FastaDB::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> line 107. >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108. >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> Processing run.log file... >>>>>>> >>>>>>> >>>>>>> Maker is now finished!!! >>>>>>> >>>>>>> Many thanks for you help >>>>>>> >>>>>>> Christelle >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-03-19 14:01 GMT+01:00 Carson Holt : >>>>>>> Your problem is one of the following. You need to reinstall the >>>>>>> DBD::SQLite module, you are running in a directory you don?t have >>>>>>> permissions for, you set your TMDIR environmental variable or TMP value >>>>>>> in maker_opts.ctl to an NFS mounted or memory mounted directory, or you >>>>>>> are using a self compiled version of Perl (I.e. not /usr/bin/perl) that >>>>>>> has issues (probably with DB or SQLite modules). You can also >>>>>>> completely delete the output directory, and start again to see if it was >>>>>>> just a random error. You should look at each of those first. You can >>>>>>> also run MAKER with the --debug command line flag and send it to me if >>>>>>> all of those seem not to be the issue. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> From: Chris Bioinfo >>>>>>> Date: Wednesday, March 19, 2014 at 5:09 AM >>>>>>> To: >>>>>>> Subject: [maker-devel] Annotation with maker2 >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I'm installing/using maker2 for the first time and I have an error by >>>>>>> using it. >>>>>>> >>>>>>> I certainly missing something, but I don't know what. >>>>>>> >>>>>>> I compile maker with no error message and I have all these directories >>>>>>> after compilation: >>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README src >>>>>>> >>>>>>> Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I >>>>>>> have this error: >>>>>>> >>>>>>> STATUS: Now running MAKER... >>>>>>> examining contents of the fasta file and run log >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now starting the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) >>>>>>> failed: unable to open database file at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> ... >>>>>>> >>>>>>> ideas? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Christelle >>>>>>> >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 21 09:43:59 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 21 Mar 2014 08:43:59 -0700 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: Hi, I just wanted to say thanks for all your help- I did the reciprocal best blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to associate names between strain assemblies/annotations. Worked perfectly! -Janna On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt wrote: > maker_map_ids does a translation (i.e. change gene-A to smug1), so you > need to know which genes you want to translate names to (two column input > file, column 1 -> original ID, column 2 -> new ID). I'm not sure EST > forward is the best way to do this, although I do think maker_map_ids is > the tool to use in the end. The question is how to make a list of IDs to > translate as the input to maker_map_ids? > > I would actually just use BLASTP against the reference strain, and then > do reciprocal best BLAST hits. To do this you BLAST your reference > proteins against your maker proteins. Then do the opposite, BLAST your > maker proteins against your reference proteins. If they are both each > others best hit, then they are orthologous, and you can safely make a two > column entry for the maker_map_ids input (i.e. maker-gene-1 translates into > smug1). > > --Carson > > > From: Daniel Ence > Date: Friday, March 14, 2014 at 11:32 AM > To: Janna Fierst , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] associating gene names between related strains > > Hi Janna, So do you have one strain that you want to use as the reference > for all the others? There's a script that comes with MAKER called > maker_map_ids that lets you use a common prefix or suffix for entries in a > fasta file from one strain and then use est_forward to use that ID in the > gene models for the other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > Janna Fierst [jfierst at uoregon.edu] > *Sent:* Friday, March 14, 2014 10:06 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate > the gene naming so that orthologs between species can be associated by > name. I have been playing around a little with the est_forward option but > can't figure out a good system/workflow that preserves names but still uses > the strain-specific RNA-Seq EST set for the actual gene models. Thanks! > -Janna > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 21 09:54:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 09:54:15 -0600 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: I'm glad we could help. --Carson From: Janna Fierst Date: Friday, March 21, 2014 at 9:43 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] associating gene names between related strains Hi, I just wanted to say thanks for all your help- I did the reciprocal best blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to associate names between strain assemblies/annotations. Worked perfectly! -Janna On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt wrote: > maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to > know which genes you want to translate names to (two column input file, column > 1 -> original ID, column 2 -> new ID). I?m not sure EST forward is the best > way to do this, although I do think maker_map_ids is the tool to use in the > end. The question is how to make a list of IDs to translate as the input to > maker_map_ids? > > I would actually just use BLASTP against the reference strain, and then do > reciprocal best BLAST hits. To do this you BLAST your reference proteins > against your maker proteins. Then do the opposite, BLAST your maker proteins > against your reference proteins. If they are both each others best hit, then > they are orthologous, and you can safely make a two column entry for the > maker_map_ids input (i.e. maker-gene-1 translates into smug1). > > ?Carson > > > From: Daniel Ence > Date: Friday, March 14, 2014 at 11:32 AM > To: Janna Fierst , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] associating gene names between related strains > > Hi Janna, So do you have one strain that you want to use as the reference for > all the others? There's a script that comes with MAKER called maker_map_ids > that lets you use a common prefix or suffix for entries in a fasta file from > one strain and then use est_forward to use that ID in the gene models for the > other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna > Fierst [jfierst at uoregon.edu] > Sent: Friday, March 14, 2014 10:06 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate the > gene naming so that orthologs between species can be associated by name. I > have been playing around a little with the est_forward option but can't figure > out a good system/workflow that preserves names but still uses the > strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hossein.Borhan at AGR.GC.CA Fri Mar 21 10:41:38 2014 From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein) Date: Fri, 21 Mar 2014 16:41:38 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: Dear Carson I ran maker and modified .pm files and it resolved the problem with the fasta output. Thanks a lot for your help. HB On 14-03-17 1:45 PM, "Carson Holt" wrote: >I have attached 4 files for you to place in the .../maker/Widgets/ >directory. > >The *blast.pm files will suppress the BLAST+ failures you are getting >(alternatively you can just downgrade to BLAST 2.27 to get the same >effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. >In the latest release NCBI redid all their warnings and error codes so it >spits out a lot of garbage and fails with different messages than it did >before. For example BLAST now warns you every time it encounter a fasta >header with a comment (virtually every fasta entry in existence falls in >this category), so your screen will be awash with meaningless warning >messages. > >The fgenesh.pm file will fix the other failure, which only occurs if you >use fgenesh simultaneously with the est_fustion=1 option. No other >predictors are affected. > >Thanks, >Carson > > >On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: > >>Dear Carson >> >>Sorry for the late reply. I was away for a couple of days. I have >>uploaded >>the out put files plus control and error output on the FTP site that you >>provided >>The user ID is borhanh >> >>I used blast+ for this run. >> >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >>On 14-03-13 10:00 AM, "Carson Holt" >>wrote: >> >>>Just resending this to the correct maker-devel address. Please when >>>replying, do not CC the incorrect maker-devel-bounce address. >>> >>>Thanks, >>>Carson >>> >>> >>>On 3/13/14, 9:56 AM, "Carson Holt" >>>wrote: >>> >>>>FGENESH is not a heavily used tool, so depending on which version it is >>>>(either too old or too new), output might be slightly different which >>>>could cause incorrect parsing. Could you tar up your maker.output >>>>folder, >>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>(send me either your user/guest ID after you upload). >>>> >>>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>>is >>>>the old legacy version of NCBI BLAST. You can do this by setting the >>>>blast type in maker_bopts.ctl and the location of executables in >>>>maker_exe.ctl. >>>> >>>>Thanks, >>>>Carson >>>> >>>> >>>> >>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>>wrote: >>>> >>>>>Dear Maker users >>>>> >>>>> >>>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>>the >>>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>>inserted in the nucleotide sequence of some of the genes. This seems >>>>>to >>>>>be related to transcripts predicted by fgenesh_masked. >>>>> >>>>> >>>>>Here is an example for one of the genes >>>>> >>>>> >>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>>offset:0 AE >>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>>TTTCGACAAGC >>>>> >>>>>The same genome sequence was used for the first round of maker (2.10) >>>>>without such problem. I checked the sequence for the scaffold related >>>>>to >>>>>one of the affected transcripts and there was no error in the >>>>>sequence. >>>>>I am not sure what is causing this. The only error that I could spot >>>>>in >>>>>the output error file is the following >>>>> >>>>> >>>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>>contexts/frames of query sequences. >>>>> >>>>> >>>>> >>>>>Your help is appreciated >>>>> >>>>> >>>>> >>>>>HB >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> > From carsonhh at gmail.com Fri Mar 21 10:43:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 10:43:10 -0600 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts Message-ID: Thanks for letting me know. --Carson On 3/21/14, 10:41 AM, "Borhan, Hossein" wrote: >Dear Carson > >I ran maker and modified .pm files and it resolved the problem with the >fasta output. Thanks a lot for your help. > > > > >HB > > > > > > > > >On 14-03-17 1:45 PM, "Carson Holt" wrote: > >>I have attached 4 files for you to place in the .../maker/Widgets/ >>directory. >> >>The *blast.pm files will suppress the BLAST+ failures you are getting >>(alternatively you can just downgrade to BLAST 2.27 to get the same >>effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. >>In the latest release NCBI redid all their warnings and error codes so it >>spits out a lot of garbage and fails with different messages than it did >>before. For example BLAST now warns you every time it encounter a fasta >>header with a comment (virtually every fasta entry in existence falls in >>this category), so your screen will be awash with meaningless warning >>messages. >> >>The fgenesh.pm file will fix the other failure, which only occurs if you >>use fgenesh simultaneously with the est_fustion=1 option. No other >>predictors are affected. >> >>Thanks, >>Carson >> >> >>On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: >> >>>Dear Carson >>> >>>Sorry for the late reply. I was away for a couple of days. I have >>>uploaded >>>the out put files plus control and error output on the FTP site that you >>>provided >>>The user ID is borhanh >>> >>>I used blast+ for this run. >>> >>> >>> >>> >>>Regards >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-03-13 10:00 AM, "Carson Holt" >>>wrote: >>> >>>>Just resending this to the correct maker-devel address. Please when >>>>replying, do not CC the incorrect maker-devel-bounce address. >>>> >>>>Thanks, >>>>Carson >>>> >>>> >>>>On 3/13/14, 9:56 AM, "Carson Holt" >>>>wrote: >>>> >>>>>FGENESH is not a heavily used tool, so depending on which version it >>>>>is >>>>>(either too old or too new), output might be slightly different which >>>>>could cause incorrect parsing. Could you tar up your maker.output >>>>>folder, >>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>>(send me either your user/guest ID after you upload). >>>>> >>>>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>>>is >>>>>the old legacy version of NCBI BLAST. You can do this by setting the >>>>>blast type in maker_bopts.ctl and the location of executables in >>>>>maker_exe.ctl. >>>>> >>>>>Thanks, >>>>>Carson >>>>> >>>>> >>>>> >>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>>>wrote: >>>>> >>>>>>Dear Maker users >>>>>> >>>>>> >>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>>>the >>>>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>>>inserted in the nucleotide sequence of some of the genes. This seems >>>>>>to >>>>>>be related to transcripts predicted by fgenesh_masked. >>>>>> >>>>>> >>>>>>Here is an example for one of the genes >>>>>> >>>>>> >>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>>>offset:0 AE >>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>>>TTTCGACAAGC >>>>>> >>>>>>The same genome sequence was used for the first round of maker (2.10) >>>>>>without such problem. I checked the sequence for the scaffold related >>>>>>to >>>>>>one of the affected transcripts and there was no error in the >>>>>>sequence. >>>>>>I am not sure what is causing this. The only error that I could spot >>>>>>in >>>>>>the output error file is the following >>>>>> >>>>>> >>>>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>>>contexts/frames of query sequences. >>>>>> >>>>>> >>>>>> >>>>>>Your help is appreciated >>>>>> >>>>>> >>>>>> >>>>>>HB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Mon Mar 24 04:08:25 2014 From: marc.hoeppner at imbim.uu.se (=?iso-8859-1?Q?Marc_H=F6ppner?=) Date: Mon, 24 Mar 2014 10:08:25 +0000 Subject: [maker-devel] Annotations from proteins, follow-up Message-ID: <10AFC7D0-82BA-4527-9B77-80DC4BE80CFD@imbim.uu.se> Hi, I had previously inquired about protein-based gene building (for example to create a training set for SNAP). This is currently possible with Maker (2.31), but I noticed a limitation. Specifically, I tend to run Maker once to generate all the raw computes (protein and set alignments, mostly). I then separate these out into GFF files that I can store away and use in various combinations of settings and data in parallel. However, the protein2genome option does not seem to work off pre-aligned protein data (e.g. protein2genome.gff produced with Maker). Is that intentional and is there a work-around? Or is the only option to run this with fasta files? Cheers, Marc Marc P. Hoeppner, PhD Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From sujaikumar at gmail.com Mon Mar 24 08:15:16 2014 From: sujaikumar at gmail.com (Sujai) Date: Mon, 24 Mar 2014 14:15:16 +0000 Subject: [maker-devel] Dashes in transcript predictions Message-ID: Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nGt.0.3.035610.maker.output.tgz Type: application/x-gzip Size: 45641 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 24 10:49:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 10:49:46 -0600 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: I've actually never seen that before, but looking through your output it appears to be specifically caused by setting correct_est_fusion=1, and how it interacts with some features of your dataset. I've attached a patch in the form of a file you can use to replace .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER download. Thanks, Carson From: Sujai Date: Monday, March 24, 2014 at 8:15 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Dashes in transcript predictions Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN KKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: join.pm Type: text/x-perl-script Size: 18644 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 24 11:05:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 11:05:15 -0600 Subject: [maker-devel] Annotations from proteins, follow-up Message-ID: It not so much intentional as it is a a limitation of the information in GFF3 format alignments. Right now protein2genome for Eukaryotes will only try and make exonerate derived alignments work because they have been polished around splice sites and MAKER still has access to the original protein sequence and alignment cigar string fro additional filtering, etc. With GFF3 pass-through the algorithm doesn't know nearly as much about what is passed in. For example the protein sequence is gone, cigar alignment strings are rarely included (Gap= attribute in GFF3), and it's not always clear if the alignment was polished for splice sites. Also since protein2genome=1 is expected to be used only to generate an initial training set, and not for final annotations, this is considered a reasonable restriction. If you still really want to force protein alignments from a GFF3 to be considered as potential models, you could put them in as pred_gff. In which case they will always be considered as potential models. Of course it will be relatively ugly because you lack things I mentioned before such as the alignment cigar string and original protein sequence that are normally used to filter protein2genome results for inclusion as models. --Carson On 3/24/14, 4:08 AM, "Marc H?ppner" wrote: >Hi, > >I had previously inquired about protein-based gene building (for example >to create a training set for SNAP). This is currently possible with Maker >(2.31), but I noticed a limitation. Specifically, I tend to run Maker >once to generate all the raw computes (protein and set alignments, >mostly). I then separate these out into GFF files that I can store away >and use in various combinations of settings and data in parallel. > >However, the protein2genome option does not seem to work off pre-aligned >protein data (e.g. protein2genome.gff produced with Maker). Is that >intentional and is there a work-around? Or is the only option to run this >with fasta files? > >Cheers, > >Marc > > >Marc P. Hoeppner, PhD > >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at imbim.uu.se > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 24 12:15:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 12:15:39 -0600 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: One more note on this. The sequence is actually fully correct if you just remove the '-' characters. So if you don't want to rerun MAKER with the patch, then you can use the attached script to just repair the transcript file by removing the '-' characters. Your GFF3 files and proteins files should already be correct as is. Usage --> perl fix_dash transcript_file.fasta > new_file.fasta You may need to place the script in the .../maker/bin/ directory so it can detect BioPerl if you don't have BioPerl installed system wide. Thanks, Carson From: Carson Holt Date: Monday, March 24, 2014 at 10:49 AM To: Sujai , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Dashes in transcript predictions I've actually never seen that before, but looking through your output it appears to be specifically caused by setting correct_est_fusion=1, and how it interacts with some features of your dataset. I've attached a patch in the form of a file you can use to replace .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER download. Thanks, Carson From: Sujai Date: Monday, March 24, 2014 at 8:15 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Dashes in transcript predictions Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN KKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sujaikumar at gmail.com Mon Mar 24 12:17:02 2014 From: sujaikumar at gmail.com (Sujai) Date: Mon, 24 Mar 2014 18:17:02 +0000 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: Wow. That was a super quick response. Thanks very much for confirming the problem and the fixes! On 24 March 2014 18:15, Carson Holt wrote: > One more note on this. The sequence is actually fully correct if you just > remove the '-' characters. So if you don't want to rerun MAKER with the > patch, then you can use the attached script to just repair the transcript > file by removing the '-' characters. Your GFF3 files and proteins files > should already be correct as is. > > Usage --> perl fix_dash transcript_file.fasta > new_file.fasta > > You may need to place the script in the .../maker/bin/ directory so it can > detect BioPerl if you don't have BioPerl installed system wide. > > Thanks, > Carson > > From: Carson Holt > Date: Monday, March 24, 2014 at 10:49 AM > To: Sujai , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] Dashes in transcript predictions > > I've actually never seen that before, but looking through your output it > appears to be specifically caused by setting correct_est_fusion=1, and how > it interacts with some features of your dataset. > > I've attached a patch in the form of a file you can use to replace > .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER > download. > > Thanks, > Carson > > > From: Sujai > Date: Monday, March 24, 2014 at 8:15 AM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Dashes in transcript predictions > > Dear Maker Team > > On a recent run with maker 2.31, I noticed that a couple of the > transcripts had dashes/hyphens in them. > > Example: > >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript > offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 > TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG > AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT > GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG > GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT > AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG > ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT > AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT > TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA > > The protein prediction for this transcript is ok: > > >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 > eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 > > MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY > > DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV > > Is this a known bug? I tried searching for "dash|hyphen" in the email list > but couldn't find anything else. > > Best wishes, > > - Sujai > > ps. I pulled out just this one contig and ran maker on it. all the > .maker.output files are attached. > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana.garnica at anu.edu.au Mon Mar 24 17:11:01 2014 From: diana.garnica at anu.edu.au (Diana Garnica Moreno) Date: Mon, 24 Mar 2014 23:11:01 +0000 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Message-ID: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> Hi there, We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? Thanks! Diana -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 24 17:25:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 17:25:09 -0600 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Message-ID: You are probably getting the wrong proteins from your scripts because you are not taking into account the 5' and 3' UTR in the transcript. For example >snap_masked-contig-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|22|240 The 5' UTR is 261bp and the 3' UTR is 22bp long. Both would have to be trimmed before translating the transcript into a protein. Once they are trimmed you can use frame 0 for the translation. The fasta_tool that comes with MAKER can be used to quickly trim the UTR. Example: fasta_tool maker_transcripts.fasta --trim_maker_utr Then you can try your other scripts again. Thanks, Carson From: Diana Garnica Moreno Date: Monday, March 24, 2014 at 5:11 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Hi there, We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? Thanks! Diana _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Tue Mar 25 07:24:14 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Tue, 25 Mar 2014 09:24:14 -0400 Subject: [maker-devel] Maker iPlant image Message-ID: Greetings, I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From ernesto at ebi.ac.uk Tue Mar 25 04:10:59 2014 From: ernesto at ebi.ac.uk (ernesto lowy gallego) Date: Tue, 25 Mar 2014 10:10:59 +0000 Subject: [maker-devel] Incorrect translation start codon Message-ID: <53315633.2070702@ebi.ac.uk> Hi, I have been inspecting the MAKER predictions and I detected a situation which appears with a certain frequency. (See attached Apollo screenshot illustrating the situation I am going to describe): Let's say that there is est2genome evidence supporting the prediction of the 5' UTR region, I have realized that in some of these transcripts with 5'UTR, MAKER is not capable of identifying the right downstream ATG protein start codon and considers a TTG codon (coding for L) as the incorrect protein start. The proper ATG codon start is further downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly predict in this case (see the attached screenshot) Any comments on this? Thanks! ernesto -- Developer VectorBase | Ensembl Genomes -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2014-03-25 at 09.34.16.png Type: image/png Size: 32220 bytes Desc: not available URL: From carsonhh at gmail.com Tue Mar 25 08:19:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Mar 2014 08:19:22 -0600 Subject: [maker-devel] Incorrect translation start codon In-Reply-To: <53315633.2070702@ebi.ac.uk> References: <53315633.2070702@ebi.ac.uk> Message-ID: This is caused by BioPerl's is_start_codon method and default codon table returning true for non-canonical start codons. It was resolved some time ago (See previous discussion --> https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY ). Make sure you are using the most recent version of MAKER (currently 2.31). Thanks, Carson https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY On 3/25/14, 4:10 AM, "ernesto lowy gallego" wrote: >Hi, > >I have been inspecting the MAKER predictions and I detected a situation >which appears with a certain frequency. >(See attached Apollo screenshot illustrating the situation I am going to >describe): > >Let's say that there is est2genome evidence supporting the prediction of >the 5' UTR region, I have realized that in some of these transcripts >with 5'UTR, MAKER is not capable of identifying the right downstream ATG >protein start codon and considers a TTG codon (coding for L) as the >incorrect protein start. The proper ATG codon start is further >downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly >predict in this case (see the attached screenshot) > >Any comments on this? > >Thanks! > >ernesto > >-- >Developer > >VectorBase | Ensembl Genomes > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Mar 25 08:24:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Mar 2014 08:24:36 -0600 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: --> /opt/maker/bin/maker It looks like most preinstalled software is under /opt on the image. Thanks, Carson From: Daniel Standage Date: Tuesday, March 25, 2014 at 7:24 AM To: Maker Mailing List Subject: [maker-devel] Maker iPlant image Greetings, I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 25 10:33:59 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 25 Mar 2014 11:33:59 -0500 Subject: [maker-devel] maker to EvidenceModeler Message-ID: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> Hi Carson and others, Is there an easy tool/pipeline available as part of maker utilities to convert maker and SNAP output to files acceptable by EvidenceModeler? It looks like it also needs just gff files, but with a few tweaks. EvidenceModeler seems better equipped to handle PASA annotation results than maker results. Thanks Dhivya From barry.utah at gmail.com Tue Mar 25 11:51:38 2014 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 25 Mar 2014 11:51:38 -0600 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER In-Reply-To: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> References: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> Message-ID: Hi Diana, There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy. I just added a script to this library called gal_CDS_sequence which you would run like this: gal_CDS_sequence --translate genes.gff3 genome.fasta The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files. You can access the library here: http://www.sequenceontology.org/software/GAL.html Support for GAL is available via the SO mailing list: https://lists.sourceforge.net/lists/listinfo/song-devel Hope that helps, Barry On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote: > Hi there, > > We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? > > Thanks! > > Diana > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at plantbiology.msu.edu Wed Mar 26 08:21:36 2014 From: kchilds at plantbiology.msu.edu (Childs, Kevin) Date: Wed, 26 Mar 2014 14:21:36 +0000 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: Daniel, There are a few small issues with the MAKER-P_2.28 image at iPlant. I have been using the image successfully for more than a month. I typically set several environmental variables immediately after starting an ssh session. export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/ export ZOE=/opt/maker/exe/snap export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config export TMP=/tmp The image will allow you to train SNAP, but training Augustus is not possible with the current image. Augustus training requires blat which was not installed in this image. There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking. I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image. Augustus should work with a stock hmm on this image. I have not attempted to use GeneMark, and of course, fgenesh is a completely different story. Kevin Childs --- Kevin Childs, PhD Assistant Professor - Fixed Term Plant Biology Department Michigan State University kchilds at plantbiology.msu.edu 517-775-2844 (m) 517-353-5969 (l) On Mar 25, 2014, at 10:24 AM, Carson Holt wrote: > --> /opt/maker/bin/maker > > It looks like most preinstalled software is under /opt on the image. > > Thanks, > Carson > > > From: Daniel Standage > Date: Tuesday, March 25, 2014 at 7:24 AM > To: Maker Mailing List > Subject: [maker-devel] Maker iPlant image > > Greetings, > > I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From steinj at cshl.edu Wed Mar 26 12:41:37 2014 From: steinj at cshl.edu (Stein, Joshua) Date: Wed, 26 Mar 2014 18:41:37 +0000 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: Also please note that there is a tutorial available here, particularly important if you want to use in MPI mode. https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial Josh Joshua Stein, PhD Manager, Sci. Informatics III Cold Spring Harbor Laboratory steinj at cshl.edu http://ware.cshl.org/ On Mar 26, 2014, at 10:20 AM, "Childs, Kevin" wrote: > Daniel, > > There are a few small issues with the MAKER-P_2.28 image at iPlant. I have been using the image successfully for more than a month. I typically set several environmental variables immediately after starting an ssh session. > > export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/ > export ZOE=/opt/maker/exe/snap > export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config > export TMP=/tmp > > The image will allow you to train SNAP, but training Augustus is not possible with the current image. Augustus training requires blat which was not installed in this image. There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking. I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image. > > Augustus should work with a stock hmm on this image. > > I have not attempted to use GeneMark, and of course, fgenesh is a completely different story. > > Kevin Childs > > > --- > Kevin Childs, PhD > > Assistant Professor - Fixed Term > Plant Biology Department > Michigan State University > > kchilds at plantbiology.msu.edu > 517-775-2844 (m) > 517-353-5969 (l) > > On Mar 25, 2014, at 10:24 AM, Carson Holt wrote: > >> --> /opt/maker/bin/maker >> >> It looks like most preinstalled software is under /opt on the image. >> >> Thanks, >> Carson >> >> >> From: Daniel Standage >> Date: Tuesday, March 25, 2014 at 7:24 AM >> To: Maker Mailing List >> Subject: [maker-devel] Maker iPlant image >> >> Greetings, >> >> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Sat Mar 29 10:24:05 2014 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Sat, 29 Mar 2014 11:24:05 -0500 Subject: [maker-devel] Missing UTRs in GFF Message-ID: I have annotated a eukaryotic genome with MAKER 2.30. I recently realized that there are a few genes in the GFF file produced by gff3_merge with inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs have their own lines in the GFF file. However, for the problematic genes, the UTRs are not specified in the GFF file and all exons are annotated as CDS. The UTRs do appear in the gene header and the protein sequences are the correct length (do not include the UTR). I have attached an example from the GFF file. Is this a known problem, or have I done something wrong? Is there an easy way to fix the GFF file? Thanks for your help, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: missing_utr.gff Type: application/octet-stream Size: 2933 bytes Desc: not available URL: From mhinsley at ebi.ac.uk Mon Mar 31 04:20:10 2014 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Mon, 31 Mar 2014 11:20:10 +0100 Subject: [maker-devel] putative preponderance of short exons?? Message-ID: <5339415A.1020509@ebi.ac.uk> Hi I've run Maker on a de novo assembly of a species of fly and then ran some simple statistics (intron/ exon/ CDS length, exons per gene) over the GFF output and compared with a couple of other species. It all looks good except that there is a surprising number of very short exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' exons removed). I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP. I'm using maker 2.31 (unpatched). Anecdotally, these short exons appear without EST or protein evidence and they all line up with canonical splice sequences (GT----AG). (but i've only looked at a few using Apollo). While there's no requirement that exons should be longer I'm suspicious of this as there must be some evolutionary relationship between these species. I've compared with a another species annotated with Maker (using SNAP and Augustus) which is more distant (not yet publicly available), and the same pattern of short exons is present. I wondered if they were created to fulfil the need for start/stop codons, but this does not appear to be the case (mostly they are mid-gene). Is there some way to adjust the predictors eg to require external evidence? or anything else you could suggest? ... I can see the following in the tutorial but I'm not sure how they could help: pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no thanks -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom -------------- next part -------------- A non-text attachment was scrubbed... Name: exon_53.pdf Type: application/pdf Size: 10618 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 31 07:52:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 31 Mar 2014 07:52:15 -0600 Subject: [maker-devel] putative preponderance of short exons?? In-Reply-To: <5339415A.1020509@ebi.ac.uk> References: <5339415A.1020509@ebi.ac.uk> Message-ID: The intron/exon structure is determined by SNAP, Augustus, etc. It is not affected by any of the maker parameters. Only evidence alignments are affected by the maker settings. You can try retraining or manually editing the HMMs, but they might also be regions where your assembly is incorrect and those algorithms make short exons in order to make a structure work without getting stop codons mid gene. Thanks, Carson On 3/31/14, 4:20 AM, "Malcolm Hinsley" wrote: >Hi > >I've run Maker on a de novo assembly of a species of fly and then ran >some simple statistics (intron/ exon/ CDS length, exons per gene) over >the GFF output and compared with a couple of other species. >It all looks good except that there is a surprising number of very short >exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached >pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' >exons removed). > >I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP. >I'm using maker 2.31 (unpatched). > >Anecdotally, these short exons appear without EST or protein evidence >and they all line up with canonical splice sequences (GT----AG). >(but i've only looked at a few using Apollo). > >While there's no requirement that exons should be longer I'm suspicious >of this as there must be some evolutionary relationship between these >species. >I've compared with a another species annotated with Maker (using SNAP >and Augustus) which is more distant (not yet publicly available), and >the same pattern of short exons is present. >I wondered if they were created to fulfil the need for start/stop >codons, but this does not appear to be the case (mostly they are >mid-gene). > > >Is there some way to adjust the predictors eg to require external >evidence? or anything else you could suggest? ... I can see the >following in the tutorial but I'm not sure how they could help: > >pred_flank=200 #flank for extending evidence clusters sent to gene >predictors >pred_stats=0 #report AED and QI statistics for all predictions as well as >models >AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and >1) >min_protein=0 #require at least this many amino acids in predicted >proteins >alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = >yes, 0 = no >always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 >= no > > >thanks > >-- >malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 >European Bioinformatics Institute (EMBL-EBI) >European Molecular Biology Laboratory >Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD >United Kingdom > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 31 08:37:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 31 Mar 2014 08:37:15 -0600 Subject: [maker-devel] Missing UTRs in GFF In-Reply-To: References: Message-ID: Not something I've seen before, but there was a patch for another issue that was cause by the use of avoid_est_fusion=1, that may be related. Try the current stable release 2.31, and let me know if it still happens. You can also upload the contig folder from one of the regions in question here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Then I could verify the bug, and see if it is something that happens in the current release. --Carson From: Benjamin Rubin Date: Saturday, March 29, 2014 at 10:24 AM To: Subject: [maker-devel] Missing UTRs in GFF I have annotated a eukaryotic genome with MAKER 2.30. I recently realized that there are a few genes in the GFF file produced by gff3_merge with inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs have their own lines in the GFF file. However, for the problematic genes, the UTRs are not specified in the GFF file and all exons are annotated as CDS. The UTRs do appear in the gene header and the protein sequences are the correct length (do not include the UTR). I have attached an example from the GFF file. Is this a known problem, or have I done something wrong? Is there an easy way to fix the GFF file? Thanks for your help, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pushplata.singh at teri.res.in Sun Mar 2 22:29:37 2014 From: pushplata.singh at teri.res.in (Pushplata Singh) Date: Mon, 3 Mar 2014 10:59:37 +0530 Subject: [maker-devel] Query on Hardware requirement Message-ID: Hi, I am trying to assemble and analyse(bio-informatics) genome sequence of a 35 GB fungal genome. The raw data that has been generated from Illumina sequencing is of ~15 GB. Could you please suggest me the system (hardware) requirement for installing and running Maker and ALLPATHS-LG sofrware for the job? Thank you Pushplata Singh, PhD Nanobiotechnology Centre Biotechnology and Management of Bioresources Division The Energy and Resources Institute Darbari Seth Block , India Habitat Centre,Lodhi Road New Delhi 110003 India Phone +91 11 24682100 ext 2611 Fax +91 11 24682145 ------------------------------------------------------------------------------------------------------------ Disclaimer: The information contained in this e-mail is intended for the person or entity to which it is addressed, and it may contain confidential and/or privileged material. Any review or other use of this mail or taking any action based on it by persons or entities other than the intended recipient is strictly prohibited. If you receive this e-mail by mistake, please contact the sender, and delete all copies of this mail.This e-mail has been scanned and verified by McAfee SaaS Email Security, formerly MX Logic. From dence at genetics.utah.edu Mon Mar 3 07:11:34 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 3 Mar 2014 14:11:34 +0000 Subject: [maker-devel] Query on Hardware requirement In-Reply-To: References: Message-ID: Hi Pradeep, I think Allpaths is developed by the Broad Institute, so you'd have to check their documentation for their system requirments. MAKER is installable on Linux and Mac OS X computers. The throughput you'll be able to achieve with MAKER depends on how many processors and how much RAM the machine has. To take advantage of MAKER's ability to parallelize the annotation process, you need some version of MPI installed on your machine. MAKER can try to install MPI for you, but a manual installation is usually required. I hope that helps. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Pushplata Singh [pushplata.singh at teri.res.in] Sent: Sunday, March 02, 2014 10:29 PM To: maker-devel at yandell-lab.org Cc: Pradeep Dahiya Subject: [maker-devel] Query on Hardware requirement Hi, I am trying to assemble and analyse(bio-informatics) genome sequence of a 35 GB fungal genome. The raw data that has been generated from Illumina sequencing is of ~15 GB. Could you please suggest me the system (hardware) requirement for installing and running Maker and ALLPATHS-LG sofrware for the job? Thank you Pushplata Singh, PhD Nanobiotechnology Centre Biotechnology and Management of Bioresources Division The Energy and Resources Institute Darbari Seth Block , India Habitat Centre,Lodhi Road New Delhi 110003 India Phone +91 11 24682100 ext 2611 Fax +91 11 24682145 ------------------------------------------------------------------------------------------------------------ Disclaimer: The information contained in this e-mail is intended for the person or entity to which it is addressed, and it may contain confidential and/or privileged material. Any review or other use of this mail or taking any action based on it by persons or entities other than the intended recipient is strictly prohibited. If you receive this e-mail by mistake, please contact the sender, and delete all copies of this mail.This e-mail has been scanned and verified by McAfee SaaS Email Security, formerly MX Logic. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Mon Mar 3 12:08:49 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 3 Mar 2014 19:08:49 +0000 Subject: [maker-devel] FW: error runinig agustus In-Reply-To: References: Message-ID: Forwarding this to the maker-devel list. On 3/3/14, 12:04 PM, "Borhan, Hossein" wrote: >I encountered the following error while running maker (2nd annotation >using gff file of the first maker run and trinity assembled RNA seq as >EST) > >ERROR: Augustus failed >--> rank=NA, hostname=rapa.agr.gc.ca > >Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I >am using 2.31 > >Your help is appreciated > > >HB > > > > > From carsonhh at gmail.com Mon Mar 3 12:11:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 03 Mar 2014 12:11:08 -0700 Subject: [maker-devel] FW: error runinig agustus Message-ID: You will need to provide more detail. Probably the entire error log and the maker control files. Thanks, Carson On 3/3/14, 12:08 PM, "Carson Holt" wrote: >Forwarding this to the maker-devel list. > > >On 3/3/14, 12:04 PM, "Borhan, Hossein" wrote: > >>I encountered the following error while running maker (2nd annotation >>using gff file of the first maker run and trinity assembled RNA seq as >>EST) >> >>ERROR: Augustus failed >>--> rank=NA, hostname=rapa.agr.gc.ca >> >>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I >>am using 2.31 >> >>Your help is appreciated >> >> >>HB >> >> >> >> >> > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sjackman at gmail.com Tue Mar 4 19:10:42 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 4 Mar 2014 18:10:42 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and *three_prime_UTR*features. Ideally the feature type would be set to *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: > Set single_exon=1, and the minimum size to a smaller value. I think it's > set to 250 right now. Also est2genome is looking for ORF, so if there is > none (as with tRNAs) they probably won't get picked up. > > --Carson > > Sent from my iPhone > > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > > Sorry, ignore my previous question. est_forward also carries forward the > names of protein evidence and works like a charm. Thank you! > > The larger rrn16 and rrn23 genes annotated perfectly, but the smaller > rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They > are in the blastn output, and in the evidence_0.gff. rrn5 has perfect > identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value > (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing > these hits? > > organism_type=prokaryotic > est2genome=1 > protein2genome=1 > est_forward=1 > > Cheers, > Shaun > > > On 27 February 2014 15:17, Shaun Jackman wrote: > >> Is there a corresponding protein_forward=1 option to map forward protein >> names from protein2genome? >> >> Cheers, >> Shaun >> >> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >> wrote: >> >> Sorry I meant to say prefilter on the score in the mRNA column before >> passing the gff3 to model_gff. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >> >> What you can do is run it once with just est_forward=1 and >> est2genome/protein2genome set to 1. Then take those results, pass them in >> as model_gff and use the map_forward option to then filter the results >> based on mRNA score and that would copy names onto new gene under the >> standard MAKER pipeline. Eventually it?s really supposed to go into a >> separate tool that will map genes onto new assemblies (but under the hood >> the tool will just be calling MAKER with certain parameters restricted). I >> do this because if people commonly use it mixed with things like SNAP I can >> start to get some very weird behaviors. >> >> Thanks, >> Carson >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 3:04 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> It seems that this could be a very useful option in those cases where >> you have firm a priori knowledge of the placement of ESTs. However, while >> trying it I note that est_forward implies that the est2genome predictor is >> turned on, implicitly. Is this necessary for this to work? I?m after the >> behavior you describe below where exonerate is made to try really hard >> within a limited region to align an est, but I would not like maker to >> produce est2genome predictions. >> >> In general, I think this maker_coor and est_forward is a feature set that >> is worthy to be promoted into a documented feature. >> >> THanks, >> Mikael >> >> 26 feb 2014 kl. 17:09 skrev Carson Holt : >> >> It will still work without est_forward. It just works a little >> differently. Keep in mind this was a hidden feature I used to find >> stubborn or hard to find missing genes after reassembly of a genome. >> >> If est_forward is provided, MAKER will parse the database to look for the >> maker_coor tags early in the pipeline. Then it will create a list of >> locations to search, and it will search them even if there are no BLAST >> results to seed the search (normally MAKER gets a BLAST result first and >> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >> look for a match using all of chr1 as the input to exonerate even when >> BLAST finds nothing (this is a very very slow search, but can help pick up >> one or two stubborn genes that don?t remap well). To allow this, MAKER >> gives exonerate looser matching parameters (i.e. allows for single base >> pair introns perhaps caused by assembly errors). The logic here is that >> given the fact that I already told MAKER that with some degree of >> confidence I expect sequence A to map to to location X, it will try its >> hardest to make it match. >> >> Without est_forward set, the maker_coor= flag still gets read in GI.pm at >> line 1563, but only after a BLAST alignment has already seeded it to the >> region (that BLAST result has the information in its description >> parameter). MAKER will then ignore seeds completely outside of maker_coor. >> In addition any BLAST seeds that overlap maker_coor will get the search >> space for alignment polishing adjusted to match maker_coor exactly. Also >> match parameters for exonerate will not be relaxed as they were with >> est_forward. >> >> As you can see the behavior, is slightly different (because it?s an >> accidental feature). >> >> Thanks, >> Carson >> >> >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 6:37 AM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> That might be a useful and time saving accidental feature. But, reading >> the code, it seems that I need to supply maker_coor but not gene_id, as >> well as the configuration option est_forward for this to work. Any >> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >> right? >> >> Mikael >> >> 26 feb 2014 kl. 14:22 skrev Carson Holt : >> >> Yes. That should work as well as an accidental feature. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >> mikael.durling at slu.se> wrote: >> >> Can this use of maker_coor be used only to hint about the placement of >> the ests, without affecting the naming of the final genes? Ie if I have a >> database of EST where I have a priori knowledge of their rough placement, >> can this placement be given to maker without providing est_forward=1? >> >> Thanks, >> Mikael >> >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >> There is a way. It?s not a standard option and it?s undocumented, but >> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >> that. The option won?t already be there so you?ll have to type it in. >> >> There is also a feature designed to work with this option. If you add >> tags to your fasta headers, those can be used to guide the mapping and >> naming. For example, gene_id= will ensure different isoforms >> that share a common gene_id get clustered into the same gene, >> and maker_coor=chr1:1-10000 in the fasta header will force a particular >> sequence to only be mapped against chr1 within the range of 1-10000 bp and >> just using maker_coor=chr1 will force it to only be mapped against chr1. >> >> This is an undocumented way to remap genes onto new assemblies using >> blast alignments of earlier transcript or protein annotations as a guide. >> >> ?Carson >> >> >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, February 25, 2014 at 5:06 PM >> To: >> Subject: [maker-devel] Mapping gene names >> >> Hi, >> >> I?m annotating a genome using a closely related genome from Genbank, >> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >> annotate my genome. I?ve run Maker, and the annotation seems to have worked >> well. Is it possible to map the names of the genes from the related species >> to my annotation? I see the *map_forward* option, which applies to the >> *model_gff* parameter. Is there a similar option for *est* and *protein*? >> >> *maker_opts.ctl* >> >> est=NC_123456.frn >> protein=NC_123456.faa >> est2genome=1 >> protein2genome=1 >> >> Thanks, >> Shaun >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 4 19:33:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 04 Mar 2014 19:33:12 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. It focuses only on the coding genes. You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: > Set single_exon=1, and the minimum size to a smaller value. I think it's set > to 250 right now. Also est2genome is looking for ORF, so if there is none (as > with tRNAs) they probably won't get picked up. > > --Carson > > Sent from my iPhone > > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > >> Sorry, ignore my previous question. est_forward also carries forward the >> names of protein evidence and works like a charm. Thank you! >> >> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 >> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the >> blastn output, and in the evidence_0.gff. rrn5 has perfect identity, >> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < >> eval_blastn=1e-10). How should I debug which filter is removing these hits? >> organism_type=prokaryotic >> est2genome=1 >> protein2genome=1 >> est_forward=1 >> Cheers, >> Shaun >> >> >> >> On 27 February 2014 15:17, Shaun Jackman wrote: >>> Is there a corresponding protein_forward=1 option to map forward protein >>> names from protein2genome? >>> >>> >>> Cheers, >>> Shaun >>> >>> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>> ) wrote: >>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>>> What you can do is run it once with just est_forward=1 and >>>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>>> as model_gff and use the map_forward option to then filter the results >>>>> based on mRNA score and that would copy names onto new gene under the >>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>> the tool will just be calling MAKER with certain parameters restricted). >>>>> I do this because if people commonly use it mixed with things like SNAP I >>>>> can start to get some very weird behaviors. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> It seems that this could be a very useful option in those cases where you >>>>> have firm a priori knowledge of the placement of ESTs. However, while >>>>> trying it I note that est_forward implies that the est2genome predictor is >>>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>>> behavior you describe below where exonerate is made to try really hard >>>>> within a limited region to align an est, but I would not like maker to >>>>> produce est2genome predictions. >>>>> >>>>> In general, I think this maker_coor and est_forward is a feature set that >>>>> is worthy to be promoted into a documented feature. >>>>> >>>>> THanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>>> It will still work without est_forward. It just works a little >>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>> >>>>>> If est_forward is provided, MAKER will parse the database to look for the >>>>>> maker_coor tags early in the pipeline. Then it will create a list of >>>>>> locations to search, and it will search them even if there are no BLAST >>>>>> results to seed the search (normally MAKER gets a BLAST result first and >>>>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>>>> look for a match using all of chr1 as the input to exonerate even when >>>>>> BLAST finds nothing (this is a very very slow search, but can help pick >>>>>> up one or two stubborn genes that don?t remap well). To allow this, >>>>>> MAKER gives exonerate looser matching parameters (i.e. allows for single >>>>>> base pair introns perhaps caused by assembly errors). The logic here is >>>>>> that given the fact that I already told MAKER that with some degree of >>>>>> confidence I expect sequence A to map to to location X, it will try its >>>>>> hardest to make it match. >>>>>> >>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at >>>>>> line 1563, but only after a BLAST alignment has already seeded it to the >>>>>> region (that BLAST result has the information in its description >>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>> exactly. Also match parameters for exonerate will not be relaxed as they >>>>>> were with est_forward. >>>>>> >>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>> accidental feature). >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> >>>>>> From: Mikael Brandstr?m Durling >>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>> To: Carson Holt >>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>> >>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>> well as the configuration option est_forward for this to work. Any >>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>> set_forward=1 right? >>>>>> >>>>>> Mikael >>>>>> >>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> > >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.bemm at uni-wuerzburg.de Wed Mar 5 09:35:33 2014 From: felix.bemm at uni-wuerzburg.de (Felix Bemm) Date: Wed, 05 Mar 2014 17:35:33 +0100 Subject: [maker-devel] Build Issues - v2.31 Message-ID: <53175255.4050102@uni-wuerzburg.de> Hi, I am trying to build maker version 2.31. Got the following error: Configuring MAKER with MPI support 'CCFLAGSEX' is not a valid config option for Inline::C at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm line 236 at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm line 256 Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', '/software/mpich2-1.5rc3/include', 'blib', '') called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277 MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2024 Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007 Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469 MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2024 Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012 Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at ./Build line 70 Same procedure worked with 2.29-beta! Any ideas? Felix -- Felix Bemm Department of Bioinformatics University of W?rzburg, Germany Tel: +49 931 - 31 83696 Fax: +49 931 - 31 84552 felix.bemm at uni-wuerzburg.de From carsonhh at gmail.com Wed Mar 5 09:40:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Mar 2014 09:40:05 -0700 Subject: [maker-devel] Build Issues - v2.31 In-Reply-To: <53175255.4050102@uni-wuerzburg.de> References: <53175255.4050102@uni-wuerzburg.de> Message-ID: You need to update your Inline::C module. The CCFLAGSEX option was added to Inline::C a couple of years ago to allow users to pass in flags to the compiler. Thanks, Carson On 3/5/14, 9:35 AM, "Felix Bemm" wrote: >Hi, > >I am trying to build maker version 2.31. Got the following error: > >Configuring MAKER with MPI support >'CCFLAGSEX' is not a valid config option for Inline::C > at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm >line 236 > at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm >line 256 > Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', >'/software/mpich2-1.5rc3/include', 'blib', '') called at >/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277 > MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at >/usr/share/perl/5.14/Module/Build/Base.pm line 2024 > Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', >'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007 > Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') >called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469 > MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at >/usr/share/perl/5.14/Module/Build/Base.pm line 2024 > Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', >'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012 > Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at >./Build line 70 > >Same procedure worked with 2.29-beta! > >Any ideas? > >Felix > >-- >Felix Bemm >Department of Bioinformatics >University of W?rzburg, Germany >Tel: +49 931 - 31 83696 >Fax: +49 931 - 31 84552 >felix.bemm at uni-wuerzburg.de > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Wed Mar 5 12:02:26 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 5 Mar 2014 19:02:26 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: Message-ID: On 3/5/14, 11:59 AM, "Borhan, Hossein" wrote: >Dear Maker users > >I want to run maker on a fungal genome of about 45 Mb with about 1/3 of >the genome begin repeat rich. But most of the virulent genes are located >within the repeat regions flanked but stretch of repeats. I am not sure >if I use the repeat masker option I am going to miss out on the >predication of these virulent genes located within the repeats. > >Other concerns with the setting in maker-opts file for fungal genomes are: > >single_exon = 0 should this get changed to 1 since single exon genes >are quit common in fungi and what is the consequence of this on using EST >and assembled RNA as evidence for gene prediction > >correct_est_fusion=0 #limits use of ESTs in annotation >to avoid fusion genes as I understand this option will remove the >overlapping UTRs but what is the consequence of setting this option on >the use of EST for predicting ORFs > > >Thanks > > > >HB > > > > From carsonhh at gmail.com Wed Mar 5 12:17:57 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Mar 2014 12:17:57 -0700 Subject: [maker-devel] FW: maker-control file Message-ID: Not using repeat masking will cause many problems. Beside a gene being flanked by repeats does not mean it will be lost, any evidence/alignments that can seed in non-repetative regions (gene/exon) are still allowed to extend into repetitive regions during the polishing stage (aligners have two stages - seed and extend). So transposons should never seed, but genes will because there sequence will contain non-repetative regions (even if they are near repeats). single_exon should be set to 1 for fungi, just make sure to set the minimum length of single exon evidence to something reasonable like 250bp. correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). Thanks, Carson > > >On 3/5/14, 11:59 AM, "Borhan, Hossein" <> wrote: > >>Dear Maker users >> >>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of >>the genome begin repeat rich. But most of the virulent genes are located >>within the repeat regions flanked but stretch of repeats. I am not sure >>if I use the repeat masker option I am going to miss out on the >>predication of these virulent genes located within the repeats. >> >>Other concerns with the setting in maker-opts file for fungal genomes >>are: >> >>single_exon = 0 should this get changed to 1 since single exon genes >>are quit common in fungi and what is the consequence of this on using EST >>and assembled RNA as evidence for gene prediction >> >>correct_est_fusion=0 #limits use of ESTs in annotation >>to avoid fusion genes as I understand this option will remove the >>overlapping UTRs but what is the consequence of setting this option on >>the use of EST for predicting ORFs >> >> >>Thanks >> >> >> >>HB >> >> >> >> > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Thu Mar 6 00:26:29 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Thu, 6 Mar 2014 07:26:29 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: Message-ID: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Hi, I think this is an interesting comment that I would like a few more information on: correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 07:29:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 07:29:35 -0700 Subject: [maker-devel] FW: maker-control file In-Reply-To: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Message-ID: > Wouldn?t it be more sensible to rely on the evidence over probabilistic > models? Yes. Infact that is the backbone of MAKER. The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic). Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER. With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot). > The annotation would be partial, but on the other hand the chance of > incorporating false signals are smaller (assuming I can generate a clean set > of transcripts from RNA-seq data)? False signals are abundant. It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly. By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated. > As an example, using SNAP and Augustus on a bird genome - with augustus > achieving nucleotide and exon sensitivities in the 70-90% range gave a host if > false exons that were simply not supported by the RNAseq data, yet made it > into the final gene build. You will get false positives from est2genome alone approach as well. Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate). Also look at the MAKER2 paper Figure 1. The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors). > Is it possible to get some more details on how Maker uses ab-inito predictions > and reconciles them with evidence alignments? At the moment it seems to me > that maker gives higher weight to the ab-initio predictions, which to me seems > problematic. Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen based off of evidence overlap using AED (completely evidence based). It is the model generation that leverages the hint based feedback. The names of MAKER genes can let you know what the source of the model is. Any time hint based models match the evidence better the name will have hame like this ?> maker---gene- (I.e. maker-chr1-snap-gene-0.4) When the ab initio model matches better than the hint based model the name is like this ?> --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models. Thanks, Carson From: Marc H?ppner Date: Thursday, March 6, 2014 at 12:26 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] FW: maker-control file Hi, I think this is an interesting comment that I would like a few more information on: > > correct_est_fusion should not be used together with est2genome. It won?t > fail, you just get odd results. Actually est2genome should not ever be > used to generate the final annotation set. It is a convenience method > that allows you to generate rough models for training gene predictors like > SNAP and Augustus. But once they are trained it should be turned off, > because the models it produces will be partial (Ests rarely cover the > whole transcript) and the results will have many false potties from > background transcription events from your EST data. These models are good > enough to train with, but make very poor final annotations. So in the end > you should be using correct_est_fusion=1 with the SNAP pr Augustus set and > not est2genome (which should already have been turned off by then). > My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Thu Mar 6 07:40:48 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Thu, 6 Mar 2014 14:40:48 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Message-ID: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> Hi Carson, Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho). But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc. Regards, Marc On 06 Mar 2014, at 15:29, Carson Holt > wrote: Wouldn?t it be more sensible to rely on the evidence over probabilistic models? Yes. Infact that is the backbone of MAKER. The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic). Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER. With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot). The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? False signals are abundant. It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly. By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. You will get false positives from est2genome alone approach as well. Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate). Also look at the MAKER2 paper Figure 1. The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors). Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen based off of evidence overlap using AED (completely evidence based). It is the model generation that leverages the hint based feedback. The names of MAKER genes can let you know what the source of the model is. Any time hint based models match the evidence better the name will have hame like this ?> maker---gene- (I.e. maker-chr1-snap-gene-0.4) When the ab initio model matches better than the hint based model the name is like this ?> --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models. Thanks, Carson From: Marc H?ppner > Date: Thursday, March 6, 2014 at 12:26 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] FW: maker-control file Hi, I think this is an interesting comment that I would like a few more information on: correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 08:03:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 08:03:10 -0700 Subject: [maker-devel] FW: maker-control file In-Reply-To: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> Message-ID: MAKER wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page Thanks, Carson From: Marc H?ppner Date: Thursday, March 6, 2014 at 7:40 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] FW: maker-control file Hi Carson, Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho). But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc. Regards, Marc On 06 Mar 2014, at 15:29, Carson Holt wrote: >> Wouldn?t it be more sensible to rely on the evidence over probabilistic >> models? > > Yes. Infact that is the backbone of MAKER. The evidence is used to derive > hints that are passed back into the predictors and reviewed in light of the > evidence to decide on final models (no longer strictly probabalistic). Take a > look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when > you use the wrong species parameters in the predictor (I.e. A. thaliana to > annotate C. elegant) you get as much as a 3 fold increase in exon level > accuracy by using the hint feedback from MAKER. With est2genome option you > don?t get that hint feedback (normally probabilistic models, EST evidence, and > protein evidence would all work together), and the models are overall poorer > and contain more false positives (we have looked at this a lot). > > >> The annotation would be partial, but on the other hand the chance of >> incorporating false signals are smaller (assuming I can generate a clean set >> of transcripts from RNA-seq data)? > > False signals are abundant. It?s just the nature of how ESTs and especially > mRNAseq reads are generated and anchored back to the assembly. By letting > there be feedback between the probabilistic model and the evidence (both > protein and EST/mRNAseq) a lot of this is eliminated. > > >> As an example, using SNAP and Augustus on a bird genome - with augustus >> achieving nucleotide and exon sensitivities in the 70-90% range gave a host >> if false exons that were simply not supported by the RNAseq data, yet made it >> into the final gene build. > > You will get false positives from est2genome alone approach as well. Models > will be more partial, and false negative rate will be very high (often 30-70% > false negative rate). Also look at the MAKER2 paper Figure 1. The false > positive rate from ab initio alone can be quite high, but with the evidence > feedback it is substantially reduced (especially for poorly trained > predictors). > > >> Is it possible to get some more details on how Maker uses ab-inito >> predictions and reconciles them with evidence alignments? At the moment it >> seems to me that maker gives higher weight to the ab-initio predictions, >> which to me seems problematic. > > Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen > based off of evidence overlap using AED (completely evidence based). It is > the model generation that leverages the hint based feedback. The names of > MAKER genes can let you know what the source of the model is. Any time hint > based models match the evidence better the name will have hame like this ?> > maker---gene- (I.e. maker-chr1-snap-gene-0.4) > > When the ab initio model matches better than the hint based model the name is > like this ?> > --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) > > > In summary, using est2genome alone (while good for generating training sets) > undercuts the power of the evidence feedback together with the probabilistic > models. > > > Thanks, > Carson > > From: Marc H?ppner > Date: Thursday, March 6, 2014 at 12:26 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] FW: maker-control file > > Hi, > > I think this is an interesting comment that I would like a few more > information on: > >> >> correct_est_fusion should not be used together with est2genome. It won?t >> fail, you just get odd results. Actually est2genome should not ever be >> used to generate the final annotation set. It is a convenience method >> that allows you to generate rough models for training gene predictors like >> SNAP and Augustus. But once they are trained it should be turned off, >> because the models it produces will be partial (Ests rarely cover the >> whole transcript) and the results will have many false potties from >> background transcription events from your EST data. These models are good >> enough to train with, but make very poor final annotations. So in the end >> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and >> not est2genome (which should already have been turned off by then). >> > > My experience has been that the process of training gene finders, especially > for complex genomes like vertebrates, is a very slow and painful process. And > ultimately, the results are far from accurate, even with a sizeable, manually > curated training set. Wouldn?t it be more sensible to rely on the evidence > over probabilistic models? The annotation would be partial, but on the other > hand the chance of incorporating false signals are smaller (assuming I can > generate a clean set of transcripts from RNA-seq data)? And I?d rather > underestimate the exon inventory slightly than putting out an annotation with > ~ 10% false exon calls. > > As an example, using SNAP and Augustus on a bird genome - with augustus > achieving nucleotide and exon sensitivities in the 70-90% range gave a host if > false exons that were simply not supported by the RNAseq data, yet made it > into the final gene build. Not sure what to think about that to be honest. Is > it possible to get some more details on how Maker uses ab-inito predictions > and reconciles them with evidence alignments? At the moment it seems to me > that maker gives higher weight to the ab-initio predictions, which to me seems > problematic. > > > /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Mar 6 13:56:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Mar 2014 12:56:34 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson? Sent from my iPhone On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 13:58:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 13:58:41 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Yes. I?ll fix the naming. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 1:56 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: > Trying to call non-coding RNA from ESTs or even sequence homology is extremely > messy (non-trivial problem in most organisms with high false positive rate), > so MAKER for the most part doesn?t even try to do that. It focuses only on > the coding genes. You can now use tRNAscan and snoscan in the newest version > for some non-coding RNA support (those features were only added a couple of > months ago). So just like other prediction tools (snap, augustus etc.), the > primary focus has always been the coding genes. We?ve only started adding > non-coding RNA support recently for iPlant, so it?s still relatively immature. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, March 4, 2014 at 7:10 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the > tip. > > The rRNA genes that are found with est2genome have the feature type set to > mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. > Ideally the feature type would be set to rRNA or tRNA as appropriate, and > would omit the UTR and CDS features. Is that a feature that you would be > interested in adding to MAKER? The rRNA gene names all start with ?rrn? and > the tRNA gene names with ?trn?, as is standard, so determining the appropriate > type should be straight forward. > > Thanks again for your help with this. Cheers, > Shaun > > > > On 27 February 2014 17:13, Carson Holt wrote: >> Set single_exon=1, and the minimum size to a smaller value. I think it's set >> to 250 right now. Also est2genome is looking for ORF, so if there is none >> (as with tRNAs) they probably won't get picked up. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 >>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in >>> the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, >>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < >>> eval_blastn=1e-10). How should I debug which filter is removing these hits? >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> Cheers, >>> Shaun >>> >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>> Is there a corresponding protein_forward=1 option to map forward protein >>>> names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>> ) wrote: >>>>> >>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>> passing the gff3 to model_gff. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>> >>>>>> What you can do is run it once with just est_forward=1 and >>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>> in as model_gff and use the map_forward option to then filter the results >>>>>> based on mRNA score and that would copy names onto new gene under the >>>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>>> the tool will just be calling MAKER with certain parameters restricted). >>>>>> I do this because if people commonly use it mixed with things like SNAP I >>>>>> can start to get some very weird behaviors. >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> From: Mikael Brandstr?m Durling >>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>> To: Carson Holt >>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>> >>>>>> It seems that this could be a very useful option in those cases where you >>>>>> have firm a priori knowledge of the placement of ESTs. However, while >>>>>> trying it I note that est_forward implies that the est2genome predictor >>>>>> is turned on, implicitly. Is this necessary for this to work? I?m after >>>>>> the behavior you describe below where exonerate is made to try really >>>>>> hard within a limited region to align an est, but I would not like maker >>>>>> to produce est2genome predictions. >>>>>> >>>>>> In general, I think this maker_coor and est_forward is a feature set that >>>>>> is worthy to be promoted into a documented feature. >>>>>> >>>>>> THanks, >>>>>> Mikael >>>>>> >>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Mar 6 16:00:40 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 6 Mar 2014 23:00:40 +0000 Subject: [maker-devel] maker problem with running blast In-Reply-To: References: Message-ID: Your blast_type parameter in maker_bopts.ctl is set to 'wublast' but the executables for wublast are blank in maker_exe.ctl. See, they?re blank ?> xdformat=#location of WUBLAST xdformat executable blasta=#location of WUBLAST blasta executable You either need to provide executables or set your blast_type parameter to something else. For example, you could set it to 'NCBI+', but you will nee to fix the location of makeblastdb. makeblastdb is set incorrectly here?> makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of NCBI+ makeblastdb executable Alternativley you can set blast_type to 'NCBI', but you will need to uncomment the executables. Here?> formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable blastall=#/usr/local/bin/blastall #location of NCBI blastall executable ?Carson On 3/6/14, 3:51 PM, "Borhan, Hossein" wrote: >Hi > >I have installed latest version of blast+ and provided the excitable path >to the maker_exec.ctl as follow > >#-----Location of Executables Used by MAKER/EVALUATOR >makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of >NCBI+ makeblastdb executable >blastn=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastn #location >of NCBI+ blastn executable >blastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastx #location >of NCBI+ blastx executable >tblastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/tblastx >#location of NCBI+ tblastx executable >formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable >blastall=#/usr/local/bin/blastall #location of NCBI blastall executable >xdformat=#location of WUBLAST xdformat executable >blasta=#location of WUBLAST blasta executable >RepeatMasker=/usr/local/RepeatMasker/RepeatMasker #location of >RepeatMasker executable >exonerate=/home/AAFC-AAC/borhanh/bin/exonerate-2.2.0-x86_64/bin/exonerate >#location of exonerate executable > >#-----Ab-initio Gene Prediction Algorithms >snap=/home/AAFC-AAC/borhanh/bin/snap/snap #location of snap executable >gmhmme3=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/gmhmme3 >#location of eukaryotic genemark executable >gmhmmp= #location of prokaryotic genemark executable >augustus=/usr/local/augustus.2.5.5/bin/augustus #location of augustus >executable >fgenesh=/usr/local/FGENESH/fgenesh #location of fgenesh executable > >#-----Other Algorithms >fathom=/home/AAFC-AAC/borhanh/bin/snap/fathom #location of fathom >executable (experimental) >probuild=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/probuild >#location of probuild executable (required for genemark) > > > > > >But when running maker I get this error > > >STATUS: Parsing control files... >WARNING: blast_type is set to 'wublast' but executables cannot be located >ERROR: Please provide a valid locaction for a BLAST algorithm in the >control files. > > > > > > > From sjackman at gmail.com Thu Mar 6 16:33:04 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Mar 2014 15:33:04 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models? Cheers, Shaun On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote: Yes. ?I?ll fix the naming. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 1:56 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson? Sent from my iPhone On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 16:38:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 16:38:48 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Well? not really. I have no plans to add est2genome support for noncoding genes (non-trivial), so you would either have to remove the ncRNA from your input, or filter it out downstream. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 4:33 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models? Cheers, Shaun On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote: > Yes. I?ll fix the naming. > > Thanks, > Carson > > > From: Shaun Jackman > Date: Thursday, March 6, 2014 at 1:56 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I agree that identifying non-coding RNA by homology in general is > a non-trivial problem. In my particular case, I have a well annotated > reference species that is very closely related (99.2% sequence identity), so > lifting over the annotations from that reference species to my species should > be pretty straight forward. It would be great if MAKER had an option for RNA > sequence homology similar to est2genome that does not imply the sequence is > coding. > > The integration of MAKER-P with tRNAscan is very useful. The identified genes > are named e.g. `trnascan-205522-processed-gene-0.38`. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names > with that convention? > > Cheers, > Shaun > > > On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: >> >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were only >> added a couple of months ago). So just like other prediction tools (snap, >> augustus etc.), the primary focus has always been the coding genes. We?ve >> only started adding non-coding RNA support recently for iPlant, so it?s still >> relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for >> the tip. >> >> The rRNA genes that are found with est2genome have the feature type set to >> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. >> Ideally the feature type would be set to rRNA or tRNA as appropriate, and >> would omit the UTR and CDS features. Is that a feature that you would be >> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and >> the tRNA gene names with ?trn?, as is standard, so determining the >> appropriate type should be straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >>> Set single_exon=1, and the minimum size to a smaller value. I think it's >>> set to 250 right now. Also est2genome is looking for ORF, so if there is >>> none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>>> Sorry, ignore my previous question. est_forward also carries forward the >>>> names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>> these hits? >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> Cheers, >>>> Shaun >>>> >>>> >>>> >>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>> Is there a corresponding protein_forward=1 option to map forward protein >>>>> names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>> >>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>>> ) wrote: >>>>>> >>>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>>> passing the gff3 to model_gff. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and >>>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>>> in as model_gff and use the map_forward option to then filter the >>>>>>> results based on mRNA score and that would copy names onto new gene >>>>>>> under the standard MAKER pipeline. Eventually it?s really supposed to >>>>>>> go into a separate tool that will map genes onto new assemblies (but >>>>>>> under the hood the tool will just be calling MAKER with certain >>>>>>> parameters restricted). I do this because if people commonly use it >>>>>>> mixed with things like SNAP I can start to get some very weird >>>>>>> behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where >>>>>>> you have firm a priori knowledge of the placement of ESTs. However, >>>>>>> while trying it I note that est_forward implies that the est2genome >>>>>>> predictor is turned on, implicitly. Is this necessary for this to work? >>>>>>> I?m after the behavior you describe below where exonerate is made to try >>>>>>> really hard within a limited region to align an est, but I would not >>>>>>> like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set >>>>>>> that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Thu Mar 6 16:41:55 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Thu, 6 Mar 2014 23:41:55 +0000 Subject: [maker-devel] Long introns from Augustus Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08236@EXCHANGE-MB01.internal.solazyme.com> Hi, we have a very compact genome and we are getting a lot of fused gene models from running Augustus. I am wondering if anyone has any advice about how to prevent introns above a certain cutoff from being created? I tried a couple of things, some settings in a probabilities file and also changing a long list of probabilities to another file that someone had suggested on a forum. So far I don't really see any changes though. Any advice would be greatly appreciated. Thanks, Shane From carsonhh at gmail.com Thu Mar 6 16:46:53 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 16:46:53 -0700 Subject: [maker-devel] Long introns from Augustus Message-ID: Are these the ab intio calls that are merged or final MAKER models. ?Carson On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >Hi, we have a very compact genome and we are getting a lot of fused gene >models from running Augustus. I am wondering if anyone has any advice >about how to prevent introns above a certain cutoff from being created? > >I tried a couple of things, some settings in a probabilities file and >also changing a long list of probabilities to another file that someone >had suggested on a forum. So far I don't really see any changes though. > >Any advice would be greatly appreciated. > >Thanks, >Shane > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sbrubaker at solazyme.com Thu Mar 6 17:48:15 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 7 Mar 2014 00:48:15 +0000 Subject: [maker-devel] Long introns from Augustus In-Reply-To: References: Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, March 06, 2014 3:47 PM To: Shane Brubaker; maker-devel at yandell-lab.org Subject: Re: [maker-devel] Long introns from Augustus Are these the ab intio calls that are merged or final MAKER models. ?Carson On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >Hi, we have a very compact genome and we are getting a lot of fused >gene models from running Augustus. I am wondering if anyone has any >advice about how to prevent introns above a certain cutoff from being created? > >I tried a couple of things, some settings in a probabilities file and >also changing a long list of probabilities to another file that someone >had suggested on a forum. So far I don't really see any changes though. > >Any advice would be greatly appreciated. > >Thanks, >Shane > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Mon Mar 10 04:27:25 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 10:27:25 +0000 Subject: [maker-devel] keep_preds values Message-ID: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> Hi, Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either. Thanks, Mikael From robert.king at rothamsted.ac.uk Mon Mar 10 06:17:07 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Mon, 10 Mar 2014 12:17:07 +0000 Subject: [maker-devel] annotation comparison aed plots Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Dear Maker Developers, I've updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I've produced two annotations using as evidence: Annotation 1: Uniprot proteins search using species keyword "fusarium" Pubmed mRNA for the name of the organism Prior annotation reference transcripts Annotation 2: Uniprot proteins search using species keyword "fusarium" Pubmed mRNA for the name of the organism Prior annotation reference transcripts mRNA trinity assembly pasafly of different strain (only RNA-seq available) I'm not sure if it was a smart move to use the prior annotation reference transcripts? I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I've got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I'm waiting on some perl modules to be installed by our administrator to test out the "Evaluator" and "compare" programs too, what do they do? Best Wishes Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Mar 10 08:47:42 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 10 Mar 2014 14:47:42 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> References: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> Message-ID: Hi Mikael, The keep_preds parameter is often used the same as a binary parameter, but it doesn't have to be. The concordance that is mentioned in the comment line is the AED for that prediction. AED is a measurement of how well a prediction is supported by the evidence and ranges from 0 - 1. A prediction with an AED of 0 matches the evidence exactly while a prediction with an AED of 1 isn't overlapped by any evidence. The default behavior for MAKER is to make a gene model out of a prediction with any AED <1. When you change the keep_preds option from 0 to 1, then MAKER will make a gene model out of any prediction that matches the other parameters (like single_exon, min_exon, etc). Setting the keep_preds option to somewhere in between 0 and 1 will set a ceiling on the AED required for promoting a prediction to a gene model. >From a user standpoint, when you will almost certainly lose gene models when you set AED at an intermediate value, but you might benefit by knowing that all your models will now have an AED of at least a certain value. I hope that helps; let me know if it didn't. ~Daniel PS The original paper that described the AED is Eilbeck et al in BMC Bioinformatics 2009. It's also discussed in more detail in the MAKER2 paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews Genetics paper from 2012. Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandstr?m Durling [mikael.durling at slu.se] Sent: Monday, March 10, 2014 4:27 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] keep_preds values Hi, Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either. Thanks, Mikael _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 10 09:51:21 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 08:51:21 -0700 Subject: [maker-devel] keep_preds values Message-ID: Actually that is false. The keep_preds option is still binary. Any value other than 0 sets it to true. There was discussion about making it a non-binary value, but that has not been implemented. ?Carson On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >Hi Mikael, > >The keep_preds parameter is often used the same as a binary parameter, >but it doesn't have to be. The concordance that is mentioned in the >comment line is the AED for that prediction. AED is a measurement of how >well a prediction is supported by the evidence and ranges from 0 - 1. A >prediction with an AED of 0 matches the evidence exactly while a >prediction with an AED of 1 isn't overlapped by any evidence. > >The default behavior for MAKER is to make a gene model out of a >prediction with any AED <1. When you change the keep_preds option from 0 >to 1, then MAKER will make a gene model out of any prediction that >matches the other parameters (like single_exon, min_exon, etc). Setting >the keep_preds option to somewhere in between 0 and 1 will set a ceiling >on the AED required for promoting a prediction to a gene model. > >From a user standpoint, when you will almost certainly lose gene models >when you set AED at an intermediate value, but you might benefit by >knowing that all your models will now have an AED of at least a certain >value. > >I hope that helps; let me know if it didn't. > >~Daniel > >PS The original paper that described the AED is Eilbeck et al in BMC >Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >Genetics paper from 2012. > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Mikael Brandstr?m Durling [mikael.durling at slu.se] >Sent: Monday, March 10, 2014 4:27 AM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] keep_preds values > >Hi, > >Can someone, please, explain the keep_preds parameter, as it works now >with a value between 1 and 0? It used to be binary, but now it seems to >test concordance towards something. The maker wiki doesn?t explain it any >further either. > >Thanks, >Mikael > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Mon Mar 10 08:57:23 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 14:57:23 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: Hi Carson and Daniel, That sounds more logical to me. Then it would be appropriate to change the comment of keep_preds in the generated config files. Would it make sense to make keep_preds a non-binary value to evaluate the concordance between ab initio models obtained from different predictors? That would assume that it is less likely to be a false positive when two or more predictors suggest the same unsported model? Mikael 10 mar 2014 kl. 16:51 skrev Carson Holt : > Actually that is false. The keep_preds option is still binary. Any value > other than 0 sets it to true. There was discussion about making it a > non-binary value, but that has not been implemented. > > ?Carson > > > On 3/10/14, 7:47 AM, "Daniel Ence" wrote: > >> Hi Mikael, >> >> The keep_preds parameter is often used the same as a binary parameter, >> but it doesn't have to be. The concordance that is mentioned in the >> comment line is the AED for that prediction. AED is a measurement of how >> well a prediction is supported by the evidence and ranges from 0 - 1. A >> prediction with an AED of 0 matches the evidence exactly while a >> prediction with an AED of 1 isn't overlapped by any evidence. >> >> The default behavior for MAKER is to make a gene model out of a >> prediction with any AED <1. When you change the keep_preds option from 0 >> to 1, then MAKER will make a gene model out of any prediction that >> matches the other parameters (like single_exon, min_exon, etc). Setting >> the keep_preds option to somewhere in between 0 and 1 will set a ceiling >> on the AED required for promoting a prediction to a gene model. >> >> From a user standpoint, when you will almost certainly lose gene models >> when you set AED at an intermediate value, but you might benefit by >> knowing that all your models will now have an AED of at least a certain >> value. >> >> I hope that helps; let me know if it didn't. >> >> ~Daniel >> >> PS The original paper that described the AED is Eilbeck et al in BMC >> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >> Genetics paper from 2012. >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >> Mikael Brandstr?m Durling [mikael.durling at slu.se] >> Sent: Monday, March 10, 2014 4:27 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] keep_preds values >> >> Hi, >> >> Can someone, please, explain the keep_preds parameter, as it works now >> with a value between 1 and 0? It used to be binary, but now it seems to >> test concordance towards something. The maker wiki doesn?t explain it any >> further either. >> >> Thanks, >> Mikael >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Mon Mar 10 09:59:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 08:59:43 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: Yes. It will eventually perform an AED like calculation between multiple predictors (i.e. if you use 3 predictors it, then you require support by at least 2 predictors across all exons to get a value of 0.33). A value of 0 would be perfect concordance across all 3 predictors. ?Carson On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" wrote: >Hi Carson and Daniel, > >That sounds more logical to me. Then it would be appropriate to change >the comment of keep_preds in the generated config files. > >Would it make sense to make keep_preds a non-binary value to evaluate the >concordance between ab initio models obtained from different predictors? >That would assume that it is less likely to be a false positive when two >or more predictors suggest the same unsported model? > >Mikael > > >10 mar 2014 kl. 16:51 skrev Carson Holt : > >> Actually that is false. The keep_preds option is still binary. Any >>value >> other than 0 sets it to true. There was discussion about making it a >> non-binary value, but that has not been implemented. >> >> ?Carson >> >> >> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >> >>> Hi Mikael, >>> >>> The keep_preds parameter is often used the same as a binary parameter, >>> but it doesn't have to be. The concordance that is mentioned in the >>> comment line is the AED for that prediction. AED is a measurement of >>>how >>> well a prediction is supported by the evidence and ranges from 0 - 1. A >>> prediction with an AED of 0 matches the evidence exactly while a >>> prediction with an AED of 1 isn't overlapped by any evidence. >>> >>> The default behavior for MAKER is to make a gene model out of a >>> prediction with any AED <1. When you change the keep_preds option from >>>0 >>> to 1, then MAKER will make a gene model out of any prediction that >>> matches the other parameters (like single_exon, min_exon, etc). Setting >>> the keep_preds option to somewhere in between 0 and 1 will set a >>>ceiling >>> on the AED required for promoting a prediction to a gene model. >>> >>> From a user standpoint, when you will almost certainly lose gene models >>> when you set AED at an intermediate value, but you might benefit by >>> knowing that all your models will now have an AED of at least a certain >>> value. >>> >>> I hope that helps; let me know if it didn't. >>> >>> ~Daniel >>> >>> PS The original paper that described the AED is Eilbeck et al in BMC >>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>> Genetics paper from 2012. >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>> Sent: Monday, March 10, 2014 4:27 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] keep_preds values >>> >>> Hi, >>> >>> Can someone, please, explain the keep_preds parameter, as it works now >>> with a value between 1 and 0? It used to be binary, but now it seems to >>> test concordance towards something. The maker wiki doesn?t explain it >>>any >>> further either. >>> >>> Thanks, >>> Mikael >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From mikael.durling at slu.se Mon Mar 10 09:08:16 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 15:08:16 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Ok. But that is not implemented no as far as I can tell from the source, right? Or is it reflected in the AED for the unsupported models? Mikael 10 mar 2014 kl. 16:59 skrev Carson Holt : > Yes. It will eventually perform an AED like calculation between multiple > predictors (i.e. if you use 3 predictors it, then you require support by > at least 2 predictors across all exons to get a value of 0.33). A value > of 0 would be perfect concordance across all 3 predictors. > > ?Carson > > > > > On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" > wrote: > >> Hi Carson and Daniel, >> >> That sounds more logical to me. Then it would be appropriate to change >> the comment of keep_preds in the generated config files. >> >> Would it make sense to make keep_preds a non-binary value to evaluate the >> concordance between ab initio models obtained from different predictors? >> That would assume that it is less likely to be a false positive when two >> or more predictors suggest the same unsported model? >> >> Mikael >> >> >> 10 mar 2014 kl. 16:51 skrev Carson Holt : >> >>> Actually that is false. The keep_preds option is still binary. Any >>> value >>> other than 0 sets it to true. There was discussion about making it a >>> non-binary value, but that has not been implemented. >>> >>> ?Carson >>> >>> >>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>> >>>> Hi Mikael, >>>> >>>> The keep_preds parameter is often used the same as a binary parameter, >>>> but it doesn't have to be. The concordance that is mentioned in the >>>> comment line is the AED for that prediction. AED is a measurement of >>>> how >>>> well a prediction is supported by the evidence and ranges from 0 - 1. A >>>> prediction with an AED of 0 matches the evidence exactly while a >>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>> >>>> The default behavior for MAKER is to make a gene model out of a >>>> prediction with any AED <1. When you change the keep_preds option from >>>> 0 >>>> to 1, then MAKER will make a gene model out of any prediction that >>>> matches the other parameters (like single_exon, min_exon, etc). Setting >>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>> ceiling >>>> on the AED required for promoting a prediction to a gene model. >>>> >>>> From a user standpoint, when you will almost certainly lose gene models >>>> when you set AED at an intermediate value, but you might benefit by >>>> knowing that all your models will now have an AED of at least a certain >>>> value. >>>> >>>> I hope that helps; let me know if it didn't. >>>> >>>> ~Daniel >>>> >>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>> Genetics paper from 2012. >>>> >>>> Daniel Ence >>>> Graduate Student >>>> Eccles Institute of Human Genetics >>>> University of Utah >>>> 15 North 2030 East, Room 2100 >>>> Salt Lake City, UT 84112-5330 >>>> ________________________________________ >>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>> Sent: Monday, March 10, 2014 4:27 AM >>>> To: maker-devel at yandell-lab.org >>>> Subject: [maker-devel] keep_preds values >>>> >>>> Hi, >>>> >>>> Can someone, please, explain the keep_preds parameter, as it works now >>>> with a value between 1 and 0? It used to be binary, but now it seems to >>>> test concordance towards something. The maker wiki doesn?t explain it >>>> any >>>> further either. >>>> >>>> Thanks, >>>> Mikael >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > From carsonhh at gmail.com Mon Mar 10 10:16:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:16:59 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> References: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Message-ID: There is a value called abAED being calculated, which somewhat captures the concordance among the predictors. It is not currently printed in the GFF3, but it is used to identify the best non-overlapping ab initio predictor to put in the non-overlapping fasta file. There are a couple of things I still need to do with it to though. It?s not yet normalized to take into account the absence of a predictor in the cluster of overlapping predictions. For example, if I have 2 predictors and 2 make perfectly matching calls and 1 makes no call, they get a score of 0 before I have perfect concordance between what?s there, but I really should make it 0.33 because the abscence of the third predictor is meaningful. The unnormalized concordance value is fine for deciding which overlapping model to keep in the file, but not for global comparison. ?Carson On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" wrote: >Ok. But that is not implemented no as far as I can tell from the source, >right? Or is it reflected in the AED for the unsupported models? > >Mikael > >10 mar 2014 kl. 16:59 skrev Carson Holt : > >> Yes. It will eventually perform an AED like calculation between >>multiple >> predictors (i.e. if you use 3 predictors it, then you require support by >> at least 2 predictors across all exons to get a value of 0.33). A value >> of 0 would be perfect concordance across all 3 predictors. >> >> ?Carson >> >> >> >> >> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" >> wrote: >> >>> Hi Carson and Daniel, >>> >>> That sounds more logical to me. Then it would be appropriate to change >>> the comment of keep_preds in the generated config files. >>> >>> Would it make sense to make keep_preds a non-binary value to evaluate >>>the >>> concordance between ab initio models obtained from different >>>predictors? >>> That would assume that it is less likely to be a false positive when >>>two >>> or more predictors suggest the same unsported model? >>> >>> Mikael >>> >>> >>> 10 mar 2014 kl. 16:51 skrev Carson Holt : >>> >>>> Actually that is false. The keep_preds option is still binary. Any >>>> value >>>> other than 0 sets it to true. There was discussion about making it a >>>> non-binary value, but that has not been implemented. >>>> >>>> ?Carson >>>> >>>> >>>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>>> >>>>> Hi Mikael, >>>>> >>>>> The keep_preds parameter is often used the same as a binary >>>>>parameter, >>>>> but it doesn't have to be. The concordance that is mentioned in the >>>>> comment line is the AED for that prediction. AED is a measurement of >>>>> how >>>>> well a prediction is supported by the evidence and ranges from 0 - >>>>>1. A >>>>> prediction with an AED of 0 matches the evidence exactly while a >>>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>>> >>>>> The default behavior for MAKER is to make a gene model out of a >>>>> prediction with any AED <1. When you change the keep_preds option >>>>>from >>>>> 0 >>>>> to 1, then MAKER will make a gene model out of any prediction that >>>>> matches the other parameters (like single_exon, min_exon, etc). >>>>>Setting >>>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>>> ceiling >>>>> on the AED required for promoting a prediction to a gene model. >>>>> >>>>> From a user standpoint, when you will almost certainly lose gene >>>>>models >>>>> when you set AED at an intermediate value, but you might benefit by >>>>> knowing that all your models will now have an AED of at least a >>>>>certain >>>>> value. >>>>> >>>>> I hope that helps; let me know if it didn't. >>>>> >>>>> ~Daniel >>>>> >>>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>>> Genetics paper from 2012. >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>>> Sent: Monday, March 10, 2014 4:27 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] keep_preds values >>>>> >>>>> Hi, >>>>> >>>>> Can someone, please, explain the keep_preds parameter, as it works >>>>>now >>>>> with a value between 1 and 0? It used to be binary, but now it seems >>>>>to >>>>> test concordance towards something. The maker wiki doesn?t explain it >>>>> any >>>>> further either. >>>>> >>>>> Thanks, >>>>> Mikael >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>> >> >> > From carsonhh at gmail.com Mon Mar 10 10:18:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:18:14 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: References: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Message-ID: Sorry meant to say "3 predictors and 2 make perfectly matching calls and 1 makes no call." On 3/10/14, 9:16 AM, "Carson Holt" wrote: >There is a value called abAED being calculated, which somewhat captures >the concordance among the predictors. It is not currently printed in the >GFF3, but it is used to identify the best non-overlapping ab initio >predictor to put in the non-overlapping fasta file. There are a couple of >things I still need to do with it to though. It?s not yet normalized to >take into account the absence of a predictor in the cluster of overlapping >predictions. For example, if I have 2 predictors and 2 make perfectly >matching calls and 1 makes no call, they get a score of 0 before I have >perfect concordance between what?s there, but I really should make it 0.33 >because the abscence of the third predictor is meaningful. The >unnormalized concordance value is fine for deciding which overlapping >model to keep in the file, but not for global comparison. > >?Carson > > > >On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" >wrote: > >>Ok. But that is not implemented no as far as I can tell from the source, >>right? Or is it reflected in the AED for the unsupported models? >> >>Mikael >> >>10 mar 2014 kl. 16:59 skrev Carson Holt : >> >>> Yes. It will eventually perform an AED like calculation between >>>multiple >>> predictors (i.e. if you use 3 predictors it, then you require support >>>by >>> at least 2 predictors across all exons to get a value of 0.33). A >>>value >>> of 0 would be perfect concordance across all 3 predictors. >>> >>> ?Carson >>> >>> >>> >>> >>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" >>> >>> wrote: >>> >>>> Hi Carson and Daniel, >>>> >>>> That sounds more logical to me. Then it would be appropriate to >>>>change >>>> the comment of keep_preds in the generated config files. >>>> >>>> Would it make sense to make keep_preds a non-binary value to evaluate >>>>the >>>> concordance between ab initio models obtained from different >>>>predictors? >>>> That would assume that it is less likely to be a false positive when >>>>two >>>> or more predictors suggest the same unsported model? >>>> >>>> Mikael >>>> >>>> >>>> 10 mar 2014 kl. 16:51 skrev Carson Holt : >>>> >>>>> Actually that is false. The keep_preds option is still binary. Any >>>>> value >>>>> other than 0 sets it to true. There was discussion about making it a >>>>> non-binary value, but that has not been implemented. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>>>> >>>>>> Hi Mikael, >>>>>> >>>>>> The keep_preds parameter is often used the same as a binary >>>>>>parameter, >>>>>> but it doesn't have to be. The concordance that is mentioned in the >>>>>> comment line is the AED for that prediction. AED is a measurement of >>>>>> how >>>>>> well a prediction is supported by the evidence and ranges from 0 - >>>>>>1. A >>>>>> prediction with an AED of 0 matches the evidence exactly while a >>>>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>>>> >>>>>> The default behavior for MAKER is to make a gene model out of a >>>>>> prediction with any AED <1. When you change the keep_preds option >>>>>>from >>>>>> 0 >>>>>> to 1, then MAKER will make a gene model out of any prediction that >>>>>> matches the other parameters (like single_exon, min_exon, etc). >>>>>>Setting >>>>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>>>> ceiling >>>>>> on the AED required for promoting a prediction to a gene model. >>>>>> >>>>>> From a user standpoint, when you will almost certainly lose gene >>>>>>models >>>>>> when you set AED at an intermediate value, but you might benefit by >>>>>> knowing that all your models will now have an AED of at least a >>>>>>certain >>>>>> value. >>>>>> >>>>>> I hope that helps; let me know if it didn't. >>>>>> >>>>>> ~Daniel >>>>>> >>>>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>>>> Bioinformatics 2009. It's also discussed in more detail in the >>>>>>MAKER2 >>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>>>> Genetics paper from 2012. >>>>>> >>>>>> Daniel Ence >>>>>> Graduate Student >>>>>> Eccles Institute of Human Genetics >>>>>> University of Utah >>>>>> 15 North 2030 East, Room 2100 >>>>>> Salt Lake City, UT 84112-5330 >>>>>> ________________________________________ >>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>>>> Sent: Monday, March 10, 2014 4:27 AM >>>>>> To: maker-devel at yandell-lab.org >>>>>> Subject: [maker-devel] keep_preds values >>>>>> >>>>>> Hi, >>>>>> >>>>>> Can someone, please, explain the keep_preds parameter, as it works >>>>>>now >>>>>> with a value between 1 and 0? It used to be binary, but now it seems >>>>>>to >>>>>> test concordance towards something. The maker wiki doesn?t explain >>>>>>it >>>>>> any >>>>>> further either. >>>>>> >>>>>> Thanks, >>>>>> Mikael >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>> >>>>> >>>> >>> >>> >> > > From carsonhh at gmail.com Mon Mar 10 10:25:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:25:50 -0700 Subject: [maker-devel] annotation comparison aed plots Message-ID: I don?t know about Michaels?s script, but I?ve always used eval. It produces sensitivity/specificity metrics. It assumes the first models are 100% correct, and then tells you the sensitivity/specificity value for the second models. It is not therefor a quality metric. Instead you should view it as a change metric. Lower sensitivity tells you that models/exons have been lost between versions, and lower specificity tells you models/exons have been gained. There will also be a lost of generic statistics on exon/intron distribution and UTR length. Then the AED values from the MAEKR run can be used independently to evaluate how well models match the evidence. ?Carson From: "Robert King (RRes-Roth)" Date: Monday, March 10, 2014 at 5:17 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] annotation comparison aed plots Dear Maker Developers, I?ve updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I?ve produced two annotations using as evidence: Annotation 1: Uniprot proteins search using species keyword ?fusarium? Pubmed mRNA for the name of the organism Prior annotation reference transcripts Annotation 2: Uniprot proteins search using species keyword ?fusarium? Pubmed mRNA for the name of the organism Prior annotation reference transcripts mRNA trinity assembly pasafly of different strain (only RNA-seq available) I?m not sure if it was a smart move to use the prior annotation reference transcripts? I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I?ve got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I?m waiting on some perl modules to be installed by our administrator to test out the ?Evaluator? and ?compare? programs too, what do they do? Best Wishes Rob -- This message has been scanned for viruses and dangerous content by MailScanner , and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Mar 10 09:50:53 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 10 Mar 2014 09:50:53 -0600 Subject: [maker-devel] annotation comparison aed plots In-Reply-To: References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Message-ID: One more point. The sensitivity, specificity,and accuracy produced by the compare_annotations_3.2.pl script are gene level, and overlap is defined very liberally between annotation sets is defined as at least one nucleotide of an exon overlap. Mike On Mon, Mar 10, 2014 at 9:47 AM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Robert, > > Here are the scripts that were mentioned before. > > The AED_cdf_generator.pl script is for making cumulative distribution > function plots based on annotation edit distance. This script is quite > simple and strait forward in its internals. > > The compare_annotations_3.2.pl script is for generating summary stats for > annotations and will compare two annotations of the same assembly. > > You can run either script without arguments to get a usage statement. > > Thanks, > Mike > > > On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) < > robert.king at rothamsted.ac.uk> wrote: > >> Dear Maker Developers, >> >> >> >> I've updated a reference that was had errors and was a little incomplete >> and now trying to produce a annotation for it. Please note the reference >> has not changed dramatically. I've produced two annotations using as >> evidence: >> >> >> >> Annotation 1: >> >> Uniprot proteins search using species keyword "fusarium" >> >> Pubmed mRNA for the name of the organism >> >> Prior annotation reference transcripts >> >> >> >> Annotation 2: >> >> Uniprot proteins search using species keyword "fusarium" >> >> Pubmed mRNA for the name of the organism >> >> Prior annotation reference transcripts >> >> mRNA trinity assembly pasafly of different strain (only RNA-seq available) >> >> >> >> I'm not sure if it was a smart move to use the prior annotation reference >> transcripts? >> >> >> >> I want to compare these two annotations and have produced AED scores. How >> do I generate summary stats/figures to compare annotations. You mentioned >> last year in a post Mike Campbell has a script to produce these, do you >> know if he will post it? I've got the Eval program and converted to gtf >> format using the provided script, just waiting on some perl modules to be >> installed by admin to test it. I'm waiting on some perl modules to be >> installed by our administrator to test out the "Evaluator" and "compare" >> programs too, what do they do? >> >> >> >> Best Wishes >> >> Rob >> >> -- >> This message has been scanned for viruses and >> dangerous content by *MailScanner* , and >> we believe but do not warrant that this e-mail and any attachments >> thereto do not contain any viruses. However, you are fully responsible for >> performing any virus scanning. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Mon Mar 10 09:52:50 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 10 Mar 2014 15:52:50 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) Message-ID: I have been running MAKER 2.31 using Augustus and SNAP on an avian genome. Augustus gives pretty decent gene model predictions based on a custom model we have and the hints MAKER provides. However, SNAP seems to throw out a ton of false positives; in many cases this appears to cause erroneous gene fusions. Leaving out SNAP altogether however leads to a marked decrease in # models overall, which is worse. GeneMark had a very similar problem (high # false positives) and thus no marked improvement, either when using with both Augustus and SNAP or with Augustus alone. I have been exploring using geneid (http://genome.crg.es/software/geneid/) as an alternative, based on some feedback on another project I worked with int he past. This would be feed into MAKER using external GFF, but I wanted to see if anyone has tried geneid with MAKER first. Finally, how hard would it be to incorporate alternative callers into MAKER? For instance, would it be possible to add these like a ?plugin?? chris From carsonhh at gmail.com Mon Mar 10 11:05:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 10:05:24 -0700 Subject: [maker-devel] geneid (or alternative ab initio predictors) Message-ID: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Mon Mar 10 09:47:50 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 10 Mar 2014 09:47:50 -0600 Subject: [maker-devel] annotation comparison aed plots In-Reply-To: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Message-ID: Hi Robert, Here are the scripts that were mentioned before. The AED_cdf_generator.pl script is for making cumulative distribution function plots based on annotation edit distance. This script is quite simple and strait forward in its internals. The compare_annotations_3.2.pl script is for generating summary stats for annotations and will compare two annotations of the same assembly. You can run either script without arguments to get a usage statement. Thanks, Mike On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) < robert.king at rothamsted.ac.uk> wrote: > Dear Maker Developers, > > > > I've updated a reference that was had errors and was a little incomplete > and now trying to produce a annotation for it. Please note the reference > has not changed dramatically. I've produced two annotations using as > evidence: > > > > Annotation 1: > > Uniprot proteins search using species keyword "fusarium" > > Pubmed mRNA for the name of the organism > > Prior annotation reference transcripts > > > > Annotation 2: > > Uniprot proteins search using species keyword "fusarium" > > Pubmed mRNA for the name of the organism > > Prior annotation reference transcripts > > mRNA trinity assembly pasafly of different strain (only RNA-seq available) > > > > I'm not sure if it was a smart move to use the prior annotation reference > transcripts? > > > > I want to compare these two annotations and have produced AED scores. How > do I generate summary stats/figures to compare annotations. You mentioned > last year in a post Mike Campbell has a script to produce these, do you > know if he will post it? I've got the Eval program and converted to gtf > format using the provided script, just waiting on some perl modules to be > installed by admin to test it. I'm waiting on some perl modules to be > installed by our administrator to test out the "Evaluator" and "compare" > programs too, what do they do? > > > > Best Wishes > > Rob > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and > we believe but do not warrant that this e-mail and any attachments thereto > do not contain any viruses. However, you are fully responsible for > performing any virus scanning. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AED_cdf_generator.pl Type: text/x-perl-script Size: 2580 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: compare_annotations_3.2.pl Type: text/x-perl-script Size: 29155 bytes Desc: not available URL: From sajeet at gmail.com Mon Mar 10 12:31:40 2014 From: sajeet at gmail.com (Sajeet Haridas) Date: Mon, 10 Mar 2014 11:31:40 -0700 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt wrote: > Adding a new predictor can take some time. It obviously requires some > coding. It's usually not too hard just to convert results to GFF3 and > then pass it in. Integrated support is really only beneficial for > predictors that can take "hints" from evidence alignments (for example we > are working on EVM integration right now - > http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give > problems just drop them. GeneMark really doesn't work very good on > genomes with complex intron/exon structure (and I really wouldn't use it > for anything but fungi). > > Make sure you are also giving sufficient protein evidence. Perhaps all > proteins from chicken and pigeon for example. Then you shouldn't find > loss of any true genes if just using Augustus. Also try not to use gene > count as an indicator of performance. The value is very deceptive, > especially if the genome assembly is fragmented. > > Thanks, > Carson > > > > On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: > > >I have been running MAKER 2.31 using Augustus and SNAP on an avian > >genome. Augustus gives pretty decent gene model predictions based on a > >custom model we have and the hints MAKER provides. However, SNAP seems > >to throw out a ton of false positives; in many cases this appears to > >cause erroneous gene fusions. Leaving out SNAP altogether however leads > >to a marked decrease in # models overall, which is worse. GeneMark had a > >very similar problem (high # false positives) and thus no marked > >improvement, either when using with both Augustus and SNAP or with > >Augustus alone. > > > >I have been exploring using geneid > >(http://genome.crg.es/software/geneid/) as an alternative, based on some > >feedback on another project I worked with int he past. This would be > >feed into MAKER using external GFF, but I wanted to see if anyone has > >tried geneid with MAKER first. > > > >Finally, how hard would it be to incorporate alternative callers into > >MAKER? For instance, would it be possible to add these like a 'plugin'? > > > >chris > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 10 22:13:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 22:13:43 -0600 Subject: [maker-devel] Long introns from Augustus In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> References: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> Message-ID: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> Maybe. The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes. --Carson Sent from my iPhone > On Mar 6, 2014, at 5:48 PM, Shane Brubaker wrote: > > Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. > > I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. > > > -----Original Message----- > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Thursday, March 06, 2014 3:47 PM > To: Shane Brubaker; maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Long introns from Augustus > > Are these the ab intio calls that are merged or final MAKER models. > > ?Carson > > >> On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >> >> Hi, we have a very compact genome and we are getting a lot of fused >> gene models from running Augustus. I am wondering if anyone has any >> advice about how to prevent introns above a certain cutoff from being created? >> >> I tried a couple of things, some settings in a probabilities file and >> also changing a long list of probabilities to another file that someone >> had suggested on a forum. So far I don't really see any changes though. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From darasappan at gmail.com Mon Mar 10 14:14:03 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 10 Mar 2014 15:14:03 -0500 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing Message-ID: Hello, I've been running maker with different assembly files, reference files etc and I check the output by: 1. concatenating the gff files 2. concatenating the *transcripts.fasta files 3. concatenating the *proteins.fasta files I'm noticing that when I ran maker twice with same parameters, the second time around, many of the output subdirectories do not have a *transcripts.fasta or *proteins.fasta file in it. There are 251 subdirectories and only 97 of them have all 3 output files. Maker log looks ok to me, but I've attached it here as well. What could be the reason for this? Thanks dhivya -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.o1813247.gz Type: application/x-gzip Size: 13857217 bytes Desc: not available URL: -------------- next part -------------- From sbrubaker at solazyme.com Tue Mar 11 11:06:57 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Tue, 11 Mar 2014 17:06:57 +0000 Subject: [maker-devel] Long introns from Augustus In-Reply-To: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> References: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08FB3@EXCHANGE-MB01.internal.solazyme.com> Ok thank you. -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Monday, March 10, 2014 9:14 PM To: Shane Brubaker Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Long introns from Augustus Maybe. The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes. --Carson Sent from my iPhone > On Mar 6, 2014, at 5:48 PM, Shane Brubaker wrote: > > Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. > > I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. > > > -----Original Message----- > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Thursday, March 06, 2014 3:47 PM > To: Shane Brubaker; maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Long introns from Augustus > > Are these the ab intio calls that are merged or final MAKER models. > > ?Carson > > >> On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >> >> Hi, we have a very compact genome and we are getting a lot of fused >> gene models from running Augustus. I am wondering if anyone has any >> advice about how to prevent introns above a certain cutoff from being created? >> >> I tried a couple of things, some settings in a probabilities file and >> also changing a long list of probabilities to another file that >> someone had suggested on a forum. So far I don't really see any changes though. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >> rg > > From carson.holt at genetics.utah.edu Thu Mar 13 10:00:06 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 13 Mar 2014 16:00:06 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: Just resending this to the correct maker-devel address. Please when replying, do not CC the incorrect maker-devel-bounce address. Thanks, Carson On 3/13/14, 9:56 AM, "Carson Holt" wrote: >FGENESH is not a heavily used tool, so depending on which version it is >(either too old or too new), output might be slightly different which >could cause incorrect parsing. Could you tar up your maker.output folder, >and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >(send me either your user/guest ID after you upload). > >For the BLAST error, use BLAST+ instead. You are using blastall which is >the old legacy version of NCBI BLAST. You can do this by setting the >blast type in maker_bopts.ctl and the location of executables in >maker_exe.ctl. > >Thanks, >Carson > > > >On 3/12/14, 11:58 AM, "Borhan, Hossein" wrote: > >>Dear Maker users >> >> >>I ran maker (2.31) on a fungal genome and found out that it inserted the >>word SCLAR followed by a pair of bracket like this (0x22de7020) >>inserted in the nucleotide sequence of some of the genes. This seems to >>be related to transcripts predicted by fgenesh_masked. >> >> >>Here is an example for one of the genes >> >> >>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>offset:0 AE >>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>TTTCGACAAGC >> >>The same genome sequence was used for the first round of maker (2.10) >>without such problem. I checked the sequence for the scaffold related to >>one of the affected transcripts and there was no error in the sequence. >>I am not sure what is causing this. The only error that I could spot in >>the output error file is the following >> >> >>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>contexts/frames of query sequences. >> >> >> >>Your help is appreciated >> >> >> >>HB >> >> >> >> >> >> > From carsonhh at gmail.com Thu Mar 13 10:14:54 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 10:14:54 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> Message-ID: Note protein/transcript fasts are only created when there are gene models to output to those files (so their absence means there were no gene models for that contig). Most sequences without protein/transcript fasts in your sample are very short and thus don?t contain anything. What is left either have no est2genome results or the est2genome alignments do not have sufficient open reading frame to be turned into a gene model (false merging of regions by trinity can cause this, so make sure you use the jaccard index option when assembling reads with trinity to avoid this). You are using only the est2genome=1 option. This will result in a limited set of genes that can be used for training SNAP/Augustus (so not getting results on all contigs is expected). You really won?t get much as far as results until you have one of the ab initio predictors turned on. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 11, 2014 at 8:52 AM To: Carson Holt Cc: Daniel Ence Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing Alright done. My username is daras Thanks Dhivya On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > Input and compressed file of output. > > Thanks, > Carson > > From: dhivya arasappan > Date: Monday, March 10, 2014 at 2:09 PM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Hi Carson, > > Do you mean the whole maker output? > > Thanks > dhivya > > On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: > >> Could you upload everything here ?> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> Than send us the link generated or your user ID. >> >> Thanks, >> Carson >> >> >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 1:50 PM >> To: Carson Holt , Daniel Ence >> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson and Daniel, >> >> I'm sending this across to you separately since maker list is blocking my >> email due to attachment size. >> >> As always, thanks for any guidance you can provide. >> Dhivya >> >> >> Begin forwarded message: >> >>> From: dhivya arasappan >>> Date: March 10, 2014 3:14:03 PM CDT >>> To: maker-devel at yandell-lab.org >>> Subject: maker output- transcripts.fasta and proteins.fasta files missing >>> >>> >>> Hello, >>> >>> I've been running maker with different assembly files, reference files etc >>> and I check the output by: >>> >>> 1. concatenating the gff files >>> 2. concatenating the *transcripts.fasta files >>> 3. concatenating the *proteins.fasta files >>> >>> I'm noticing that when I ran maker twice with same parameters, the second >>> time around, many of the output subdirectories do not have a >>> *transcripts.fasta or *proteins.fasta file in it. >>> There are 251 subdirectories and only 97 of them have all 3 output files. >>> Maker log looks ok to me, but I've attached it here as well. >>> >>> What could be the reason for this? >>> >>> Thanks >>> dhivya >>> >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 13 10:55:40 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 10:55:40 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> Message-ID: The second time, it should have just started where it left off, so it would run faster (because the processing from the previous job counted towards the second one). The archived output you sent me had 21,183 proteins and transcripts. If you are using the fasta_merge to collect them, just make sure the datastore.index file is not truncated or corrupt otherwise it won?t collect all the fastas from every contig. You can rebuild the datastore.index using the -dsindex flag with MAKER, if you want to check that. Also you can have maker just regenerate results without rerunning BLAST etc., by using the -a flag if you want to just recalculate ll results quickly (rebuilds all FASTA and GFF3 without redoing most analysis). ?Carson From: dhivya arasappan Date: Thursday, March 13, 2014 at 10:47 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing Thanks Carson for the response. I understand that est2genome=1 does not use any ab initio gene predictions, but simply identifies ests based on alignment. I'm a little confused because I ran maker on my assembly before, using the same parameters ( including est2genome=1). I got a very good result with > 20,000 transcripts and proteins. Then I was able to get an improved assembly, where many scaffolds were combined into superscaffolds. So I reran maker on this assembly. Same parameters, same transcriptome and proteins files. Now, I see such drastically different results: Only 500+ genes and transcripts. My scaffolds are now bigger than before, so I'm not sure how this is happening. These were the results I sent you. Another odd thing I noticed (and I am hesitant to report this because perhaps it is due to some sort of error on my part): I ran maker on the improved assembly the first time and maker did not complete in the 48 hours I allocated. But I had 19,000+ transcripts in the unfinished output. When I reran maker, just changing the time allocated, it completed much faster, but is giving much fewer transcripts and proteins as output. Could something like this happen? If not, then I'm guessing I must have changed something although I'm pretty sure that I did not change anything other than the time allocated. I've attached the trascripts and proteins files from the first time I ran maker on my improved assembly. Thanks again for your help Dhivya On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: > Note protein/transcript fasts are only created when there are gene models to > output to those files (so their absence means there were no gene models for > that contig). Most sequences without protein/transcript fasts in your sample > are very short and thus don?t contain anything. What is left either have no > est2genome results or the est2genome alignments do not have sufficient open > reading frame to be turned into a gene model (false merging of regions by > trinity can cause this, so make sure you use the jaccard index option when > assembling reads with trinity to avoid this). > > You are using only the est2genome=1 option. This will result in a limited set > of genes that can be used for training SNAP/Augustus (so not getting results > on all contigs is expected). You really won?t get much as far as results > until you have one of the ab initio predictors turned on. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Tuesday, March 11, 2014 at 8:52 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Alright done. My username is daras > > Thanks > Dhivya > > On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > >> Input and compressed file of output. >> >> Thanks, >> Carson >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 2:09 PM >> To: Carson Holt >> Cc: Daniel Ence >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson, >> >> Do you mean the whole maker output? >> >> Thanks >> dhivya >> >> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >> >>> Could you upload everything here ?> >>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Than send us the link generated or your user ID. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: dhivya arasappan >>> Date: Monday, March 10, 2014 at 1:50 PM >>> To: Carson Holt , Daniel Ence >>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >>> missing >>> >>> Hi Carson and Daniel, >>> >>> I'm sending this across to you separately since maker list is blocking my >>> email due to attachment size. >>> >>> As always, thanks for any guidance you can provide. >>> Dhivya >>> >>> >>> Begin forwarded message: >>> >>>> From: dhivya arasappan >>>> Date: March 10, 2014 3:14:03 PM CDT >>>> To: maker-devel at yandell-lab.org >>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing >>>> >>>> >>>> Hello, >>>> >>>> I've been running maker with different assembly files, reference files etc >>>> and I check the output by: >>>> >>>> 1. concatenating the gff files >>>> 2. concatenating the *transcripts.fasta files >>>> 3. concatenating the *proteins.fasta files >>>> >>>> I'm noticing that when I ran maker twice with same parameters, the second >>>> time around, many of the output subdirectories do not have a >>>> *transcripts.fasta or *proteins.fasta file in it. >>>> There are 251 subdirectories and only 97 of them have all 3 output files. >>>> Maker log looks ok to me, but I've attached it here as well. >>>> >>>> What could be the reason for this? >>>> >>>> Thanks >>>> dhivya >>>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Mar 13 10:47:25 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 13 Mar 2014 11:47:25 -0500 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> Message-ID: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> Thanks Carson for the response. I understand that est2genome=1 does not use any ab initio gene predictions, but simply identifies ests based on alignment. I'm a little confused because I ran maker on my assembly before, using the same parameters ( including est2genome=1). I got a very good result with > 20,000 transcripts and proteins. Then I was able to get an improved assembly, where many scaffolds were combined into superscaffolds. So I reran maker on this assembly. Same parameters, same transcriptome and proteins files. Now, I see such drastically different results: Only 500+ genes and transcripts. My scaffolds are now bigger than before, so I'm not sure how this is happening. These were the results I sent you. Another odd thing I noticed (and I am hesitant to report this because perhaps it is due to some sort of error on my part): I ran maker on the improved assembly the first time and maker did not complete in the 48 hours I allocated. But I had 19,000+ transcripts in the unfinished output. When I reran maker, just changing the time allocated, it completed much faster, but is giving much fewer transcripts and proteins as output. Could something like this happen? If not, then I'm guessing I must have changed something although I'm pretty sure that I did not change anything other than the time allocated. I've attached the trascripts and proteins files from the first time I ran maker on my improved assembly. Thanks again for your help Dhivya On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: > Note protein/transcript fasts are only created when there are gene > models to output to those files (so their absence means there were > no gene models for that contig). Most sequences without protein/ > transcript fasts in your sample are very short and thus don?t > contain anything. What is left either have no est2genome results or > the est2genome alignments do not have sufficient open reading frame > to be turned into a gene model (false merging of regions by trinity > can cause this, so make sure you use the jaccard index option when > assembling reads with trinity to avoid this). > > You are using only the est2genome=1 option. This will result in a > limited set of genes that can be used for training SNAP/Augustus (so > not getting results on all contigs is expected). You really won?t > get much as far as results until you have one of the ab initio > predictors turned on. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Tuesday, March 11, 2014 at 8:52 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta > files missing > > Alright done. My username is daras > > Thanks > Dhivya > > On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > >> Input and compressed file of output. >> >> Thanks, >> Carson >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 2:09 PM >> To: Carson Holt >> Cc: Daniel Ence >> Subject: Re: maker output- transcripts.fasta and proteins.fasta >> files missing >> >> Hi Carson, >> >> Do you mean the whole maker output? >> >> Thanks >> dhivya >> >> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >> >>> Could you upload everything here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Than send us the link generated or your user ID. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: dhivya arasappan >>> Date: Monday, March 10, 2014 at 1:50 PM >>> To: Carson Holt , Daniel Ence >> > >>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta >>> files missing >>> >>> Hi Carson and Daniel, >>> >>> I'm sending this across to you separately since maker list is >>> blocking my email due to attachment size. >>> >>> As always, thanks for any guidance you can provide. >>> Dhivya >>> >>> >>> Begin forwarded message: >>> >>>> From: dhivya arasappan >>>> Date: March 10, 2014 3:14:03 PM CDT >>>> To: maker-devel at yandell-lab.org >>>> Subject: maker output- transcripts.fasta and proteins.fasta files >>>> missing >>>> >>>> Hello, >>>> >>>> I've been running maker with different assembly files, reference >>>> files etc and I check the output by: >>>> >>>> 1. concatenating the gff files >>>> 2. concatenating the *transcripts.fasta files >>>> 3. concatenating the *proteins.fasta files >>>> >>>> I'm noticing that when I ran maker twice with same parameters, >>>> the second time around, many of the output subdirectories do not >>>> have a *transcripts.fasta or *proteins.fasta file in it. >>>> There are 251 subdirectories and only 97 of them have all 3 >>>> output files. Maker log looks ok to me, but I've attached it >>>> here as well. >>>> >>>> What could be the reason for this? >>>> >>>> Thanks >>>> dhivya >>>> >>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: transcripts.cat.fasta.old.gz Type: application/x-gzip Size: 7927581 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proteins.cat.fasta.old.gz Type: application/x-gzip Size: 3668381 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 13 12:53:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 12:53:05 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> <672A27A2-FFBD-45EC-9303-E3973EEA5AB6@gmail.com> <5EE3B5E8-E7DC-4F09-B52D-E08CA4D85A15@gmail.com> Message-ID: For future reference, I suggest using the ?/maker/bin/fasta_merge tool to merge based on the datastore.index rather than other command line based methods. It will handle the multiple fasta types that are produced in the results, and will validate with the datastore.index file. Example: fasta_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log The same is also true when merging gff3 files. gff3_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log Thanks, Carson From: dhivya arasappan Date: Thursday, March 13, 2014 at 12:48 PM To: Carson Holt Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing ah I forgot that some were called superscaffolds. That is a difference between the old and new assembly. This was definitely the issue. Thanks and sorry for the mix up. Dhivya On Mar 13, 2014, at 12:51 PM, Carson Holt wrote: > Note that your command does not capture everything because not all scaffolds > start with the name ?scaffold". > > This works though ?> > ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/*/*trans*fasta|wc -l > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, March 13, 2014 at 11:34 AM > To: Carson Holt > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Hi Carson, > > Am I looking in the wrong place for my fasta files? I looked here: > > ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|wc > -l > > I see only 97 such files- so 97 contigs with transcripts.fasta files? > > When I count the number of sequences in all these files, I get 514 sequences. > > grep -c '^>' > opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|cut -d ':' > -f 2|awk '{total+=$0}END{print total}' > > Could you tell how and where you are getting the 21,183 transcripts? > > thanks > dhivya > > On Mar 13, 2014, at 12:21 PM, Carson Holt wrote: > >> This is what I see in your uploaded data. There are 21,183 transcripts from >> 201 contigs. Then there are 707 contigs with no gene models. >> >> ?Carson >> >> >> From: Carson Holt >> Date: Thursday, March 13, 2014 at 11:11 AM >> To: dhivya arasappan >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> "as you saw from the output I uploaded before, the output certainly was much >> less than 20,000 transcripts? >> >> Actually there were 21,183 in the output you uploaded. I saw no loss of >> entries. >> >> ?Carson >> >> From: dhivya arasappan >> Date: Thursday, March 13, 2014 at 11:09 AM >> To: Carson Holt >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson, >> >> The datastore.index file looks fine- it has a started and finished status for >> my 980 scaffolds. I reran with increased time twice. Second time around, I >> actually deleted the entire output directory to make sure it runs all over >> again. It still seemed to complete within a day. As you saw from the output >> I uploaded before, the output certainly was much less than 20,000 >> transcripts. Given that I was seeing great results for an older version of my >> assembly, I'm puzzled as to why my results are worse this time around. Any >> suggestions of what to check or what I can do to see improved results would >> be really helpful. >> >> I do know that I went from ~4% gaps to ~6% gaps in my new assembly- other >> than that, its better in every way. Could this cause just a dramatic >> difference in results? >> >> Thanks >> dhivya >> >> On Mar 13, 2014, at 11:55 AM, Carson Holt wrote: >> >>> The second time, it should have just started where it left off, so it would >>> run faster (because the processing from the previous job counted towards the >>> second one). The archived output you sent me had 21,183 proteins and >>> transcripts. If you are using the fasta_merge to collect them, just make >>> sure the datastore.index file is not truncated or corrupt otherwise it won?t >>> collect all the fastas from every contig. You can rebuild the >>> datastore.index using the -dsindex flag with MAKER, if you want to check >>> that. Also you can have maker just regenerate results without rerunning >>> BLAST etc., by using the -a flag if you want to just recalculate ll results >>> quickly (rebuilds all FASTA and GFF3 without redoing most analysis). >>> >>> ?Carson >>> >>> >>> From: dhivya arasappan >>> Date: Thursday, March 13, 2014 at 10:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >>> >>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>> missing >>> >>> Thanks Carson for the response. I understand that est2genome=1 does not use >>> any ab initio gene predictions, but simply identifies ests based on >>> alignment. I'm a little confused because I ran maker on my assembly before, >>> using the same parameters ( including est2genome=1). I got a very good >>> result with > 20,000 transcripts and proteins. >>> >>> Then I was able to get an improved assembly, where many scaffolds were >>> combined into superscaffolds. So I reran maker on this assembly. Same >>> parameters, same transcriptome and proteins files. Now, I see such >>> drastically different results: Only 500+ genes and transcripts. My >>> scaffolds are now bigger than before, so I'm not sure how this is happening. >>> These were the results I sent you. >>> >>> Another odd thing I noticed (and I am hesitant to report this because >>> perhaps it is due to some sort of error on my part): I ran maker on the >>> improved assembly the first time and maker did not complete in the 48 hours >>> I allocated. But I had 19,000+ transcripts in the unfinished output. When >>> I reran maker, just changing the time allocated, it completed much faster, >>> but is giving much fewer transcripts and proteins as output. Could >>> something like this happen? If not, then I'm guessing I must have changed >>> something although I'm pretty sure that I did not change anything other than >>> the time allocated. I've attached the trascripts and proteins files from the >>> first time I ran maker on my improved assembly. >>> >>> Thanks again for your help >>> Dhivya >>> >>> >>> >>> On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: >>> >>>> Note protein/transcript fasts are only created when there are gene models >>>> to output to those files (so their absence means there were no gene models >>>> for that contig). Most sequences without protein/transcript fasts in your >>>> sample are very short and thus don?t contain anything. What is left either >>>> have no est2genome results or the est2genome alignments do not have >>>> sufficient open reading frame to be turned into a gene model (false merging >>>> of regions by trinity can cause this, so make sure you use the jaccard >>>> index option when assembling reads with trinity to avoid this). >>>> >>>> You are using only the est2genome=1 option. This will result in a limited >>>> set of genes that can be used for training SNAP/Augustus (so not getting >>>> results on all contigs is expected). You really won?t get much as far as >>>> results until you have one of the ab initio predictors turned on. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> From: dhivya arasappan >>>> Date: Tuesday, March 11, 2014 at 8:52 AM >>>> To: Carson Holt >>>> Cc: Daniel Ence >>>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>>> missing >>>> >>>> Alright done. My username is daras >>>> >>>> Thanks >>>> Dhivya >>>> >>>> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: >>>> >>>>> Input and compressed file of output. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: dhivya arasappan >>>>> Date: Monday, March 10, 2014 at 2:09 PM >>>>> To: Carson Holt >>>>> Cc: Daniel Ence >>>>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>>>> missing >>>>> >>>>> Hi Carson, >>>>> >>>>> Do you mean the whole maker output? >>>>> >>>>> Thanks >>>>> dhivya >>>>> >>>>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >>>>> >>>>>> Could you upload everything here ?> >>>>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>>> >>>>>> Than send us the link generated or your user ID. >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> >>>>>> From: dhivya arasappan >>>>>> Date: Monday, March 10, 2014 at 1:50 PM >>>>>> To: Carson Holt , Daniel Ence >>>>>> >>>>>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >>>>>> missing >>>>>> >>>>>> Hi Carson and Daniel, >>>>>> >>>>>> I'm sending this across to you separately since maker list is blocking my >>>>>> email due to attachment size. >>>>>> >>>>>> As always, thanks for any guidance you can provide. >>>>>> Dhivya >>>>>> >>>>>> >>>>>> Begin forwarded message: >>>>>> >>>>>>> From: dhivya arasappan >>>>>>> Date: March 10, 2014 3:14:03 PM CDT >>>>>>> To: maker-devel at yandell-lab.org >>>>>>> Subject: maker output- transcripts.fasta and proteins.fasta files >>>>>>> missing >>>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I've been running maker with different assembly files, reference files >>>>>>> etc and I check the output by: >>>>>>> >>>>>>> 1. concatenating the gff files >>>>>>> 2. concatenating the *transcripts.fasta files >>>>>>> 3. concatenating the *proteins.fasta files >>>>>>> >>>>>>> I'm noticing that when I ran maker twice with same parameters, the >>>>>>> second time around, many of the output subdirectories do not have a >>>>>>> *transcripts.fasta or *proteins.fasta file in it. >>>>>>> There are 251 subdirectories and only 97 of them have all 3 output >>>>>>> files. Maker log looks ok to me, but I've attached it here as well. >>>>>>> >>>>>>> What could be the reason for this? >>>>>>> >>>>>>> Thanks >>>>>>> dhivya >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Mar 13 15:04:23 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 13 Mar 2014 21:04:23 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas > wrote: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt > wrote: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 14 10:06:26 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 14 Mar 2014 09:06:26 -0700 Subject: [maker-devel] associating gene names between related strains Message-ID: Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Mar 14 11:32:02 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 14 Mar 2014 17:32:02 +0000 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species. Let me know if that's not what you're looking for, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu] Sent: Friday, March 14, 2014 10:06 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] associating gene names between related strains Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 14 12:01:16 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 14 Mar 2014 11:01:16 -0700 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: I will try it today. Thanks for the quick reply! On Fri, Mar 14, 2014 at 10:32 AM, Daniel Ence wrote: > Hi Janna, So do you have one strain that you want to use as the > reference for all the others? There's a script that comes with MAKER called > maker_map_ids that lets you use a common prefix or suffix for entries in a > fasta file from one strain and then use est_forward to use that ID in the > gene models for the other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > Janna Fierst [jfierst at uoregon.edu] > *Sent:* Friday, March 14, 2014 10:06 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate > the gene naming so that orthologs between species can be associated by > name. I have been playing around a little with the est_forward option but > can't figure out a good system/workflow that preserves names but still uses > the strain-specific RNA-Seq EST set for the actual gene models. Thanks! > -Janna > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 12:02:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 12:02:48 -0600 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to know which genes you want to translate names to (two column input file, column 1 -> original ID, column 2 -> new ID). I?m not sure EST forward is the best way to do this, although I do think maker_map_ids is the tool to use in the end. The question is how to make a list of IDs to translate as the input to maker_map_ids? I would actually just use BLASTP against the reference strain, and then do reciprocal best BLAST hits. To do this you BLAST your reference proteins against your maker proteins. Then do the opposite, BLAST your maker proteins against your reference proteins. If they are both each others best hit, then they are orthologous, and you can safely make a two column entry for the maker_map_ids input (i.e. maker-gene-1 translates into smug1). ?Carson From: Daniel Ence Date: Friday, March 14, 2014 at 11:32 AM To: Janna Fierst , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] associating gene names between related strains Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species. Let me know if that's not what you're looking for, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu] Sent: Friday, March 14, 2014 10:06 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] associating gene names between related strains Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 12:43:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 12:43:41 -0600 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com> References: <3219E92A-2024-45C6-84A9-66C646287D7E@gmail.com> <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com> Message-ID: I?m glad you were able to fix it. I?ll check to see why it was failing as well. Thanks, Carson From: dhivya arasappan Date: Friday, March 14, 2014 at 10:16 AM To: Carson Holt Subject: Re: Error when running maker2zff script Kindly ignore my previous question. I was able to manipulate the scaffold names in the gff file to get maker2zff to work. Thanks dhivya On Mar 14, 2014, at 10:55 AM, dhivya arasappan wrote: > My message got flagged by the maker list again, so I?m forwarding this > separately to you. Is there a better way to send biggish files? > > > Thank you > Dhivya > > > > Begin forwarded message: > >> From: dhivya arasappan >> Subject: Error when running maker2zff script >> Date: March 13, 2014 at 8:35:27 PM CDT >> To: Carson Holt , maker-devel at yandell-lab.org >> >> Hi Carson, >> >> I used gff3_merge to create my gff file from maker output. I've attached it >> here. But when I run maker2zff on it, I get the following error: >> >> Can't use an undefined value as an ARRAY reference at >> /opt/apps/maker/2.30/bin/maker2zff line 177, line 7294251. >> >> It produces an incomplete output file and it looks like it may be running >> into problems when it encounters scaffold3%2F0. I'm wondering if its having >> problems with my scaffold names. There seem to be some inconsistencies >> because it's referred to as scaffold3%F0 and scaffold3/0 in the gff file. >> It goes through other scaffolds like SCAFFOLD3_873, SCAFFOLD3_95 etc just >> fine. I did try replacing the scaffold names in the gff file, but still get >> the same error. Any ideas? >> >> Substitution command I used, for your reference: sed 's/3\%2F/3_/g' gfffile| >> sed 's/\//\_/' > mod.gfffile >> >> Thanks >> Dhivya >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 13:25:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 13:25:58 -0600 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: We can look into it. ?Carson From: "Fields, Christopher J" Date: Thursday, March 13, 2014 at 3:04 PM To: Sajeet Haridas Cc: Carson Holt , " List" Subject: Re: [maker-devel] geneid (or alternative ab initio predictors) That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas wrote: > One of the problems I have found with genemark is that it does not understand > a soft-masked genome. Hence, the self training is incorrect. I have found > marked improvement to genemark's prediction by running the training on a hard > masked genome. > > > On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt wrote: >> Adding a new predictor can take some time. It obviously requires some >> coding. It?s usually not too hard just to convert results to GFF3 and >> then pass it in. Integrated support is really only beneficial for >> predictors that can take ?hints? from evidence alignments (for example we >> are working on EVM integration right now - >> http://evidencemodeler.sourceforge.net >> ). If SNAP and GeneMark give >> problems just drop them. GeneMark really doesn?t work very good on >> genomes with complex intron/exon structure (and I really wouldn?t use it >> for anything but fungi). >> >> Make sure you are also giving sufficient protein evidence. Perhaps all >> proteins from chicken and pigeon for example. Then you shouldn?t find >> loss of any true genes if just using Augustus. Also try not to use gene >> count as an indicator of performance. The value is very deceptive, >> especially if the genome assembly is fragmented. >> >> Thanks, >> Carson >> >> >> >> On 3/10/14, 8:52 AM, "Fields, Christopher J" wrote: >> >>> >I have been running MAKER 2.31 using Augustus and SNAP on an avian >>> >genome. Augustus gives pretty decent gene model predictions based on a >>> >custom model we have and the hints MAKER provides. However, SNAP seems >>> >to throw out a ton of false positives; in many cases this appears to >>> >cause erroneous gene fusions. Leaving out SNAP altogether however leads >>> >to a marked decrease in # models overall, which is worse. GeneMark had a >>> >very similar problem (high # false positives) and thus no marked >>> >improvement, either when using with both Augustus and SNAP or with >>> >Augustus alone. >>> > >>> >I have been exploring using geneid >>> >(http://genome.crg.es/software/geneid/) as an alternative, based on some >>> >feedback on another project I worked with int he past. This would be >>> >feed into MAKER using external GFF, but I wanted to see if anyone has >>> >tried geneid with MAKER first. >>> > >>> >Finally, how hard would it be to incorporate alternative callers into >>> >MAKER? For instance, would it be possible to add these like a ?plugin?? >>> > >>> >chris >>> >_______________________________________________ >>> >maker-devel mailing list >>> >maker-devel at box290.bluehost.com >>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri Mar 14 20:22:55 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sat, 15 Mar 2014 02:22:55 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: <53FD788A-15EA-4A18-BB2F-3072178816CA@illinois.edu> Not an issue at the moment; I?ll likely supply these via gff for now. If needed I can work off a svn checkout and send along a patch should I ever manage to eek out time to work on it. chris On Mar 14, 2014, at 2:25 PM, Carson Holt > wrote: We can look into it. ?Carson From: "Fields, Christopher J" > Date: Thursday, March 13, 2014 at 3:04 PM To: Sajeet Haridas > Cc: Carson Holt >, "> List" > Subject: Re: [maker-devel] geneid (or alternative ab initio predictors) That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas > wrote: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt > wrote: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Mon Mar 17 13:45:15 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 17 Mar 2014 19:45:15 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: I have attached 4 files for you to place in the .../maker/Widgets/ directory. The *blast.pm files will suppress the BLAST+ failures you are getting (alternatively you can just downgrade to BLAST 2.27 to get the same effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. In the latest release NCBI redid all their warnings and error codes so it spits out a lot of garbage and fails with different messages than it did before. For example BLAST now warns you every time it encounter a fasta header with a comment (virtually every fasta entry in existence falls in this category), so your screen will be awash with meaningless warning messages. The fgenesh.pm file will fix the other failure, which only occurs if you use fgenesh simultaneously with the est_fustion=1 option. No other predictors are affected. Thanks, Carson On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: >Dear Carson > >Sorry for the late reply. I was away for a couple of days. I have uploaded >the out put files plus control and error output on the FTP site that you >provided >The user ID is borhanh > >I used blast+ for this run. > > > > >Regards > > >HB > > > > > > > > >On 14-03-13 10:00 AM, "Carson Holt" wrote: > >>Just resending this to the correct maker-devel address. Please when >>replying, do not CC the incorrect maker-devel-bounce address. >> >>Thanks, >>Carson >> >> >>On 3/13/14, 9:56 AM, "Carson Holt" wrote: >> >>>FGENESH is not a heavily used tool, so depending on which version it is >>>(either too old or too new), output might be slightly different which >>>could cause incorrect parsing. Could you tar up your maker.output >>>folder, >>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>(send me either your user/guest ID after you upload). >>> >>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>is >>>the old legacy version of NCBI BLAST. You can do this by setting the >>>blast type in maker_bopts.ctl and the location of executables in >>>maker_exe.ctl. >>> >>>Thanks, >>>Carson >>> >>> >>> >>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Dear Maker users >>>> >>>> >>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>the >>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>inserted in the nucleotide sequence of some of the genes. This seems to >>>>be related to transcripts predicted by fgenesh_masked. >>>> >>>> >>>>Here is an example for one of the genes >>>> >>>> >>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>offset:0 AE >>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>TTTCGACAAGC >>>> >>>>The same genome sequence was used for the first round of maker (2.10) >>>>without such problem. I checked the sequence for the scaffold related >>>>to >>>>one of the affected transcripts and there was no error in the sequence. >>>>I am not sure what is causing this. The only error that I could spot in >>>>the output error file is the following >>>> >>>> >>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>contexts/frames of query sequences. >>>> >>>> >>>> >>>>Your help is appreciated >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: blastn.pm Type: text/x-perl-script Size: 8112 bytes Desc: blastn.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastx.pm Type: text/x-perl-script Size: 8218 bytes Desc: blastx.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fgenesh.pm Type: text/x-perl-script Size: 19744 bytes Desc: fgenesh.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tblastx.pm Type: text/x-perl-script Size: 9113 bytes Desc: tblastx.pm URL: From carsonhh at gmail.com Mon Mar 17 15:14:42 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Mar 2014 15:14:42 -0600 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: References: Message-ID: Just an update on this. I?ve fixed the maker2zff script to handle the issues seen. Looking at this actually brought to light another issue. There is inconsistent escape character specification for GFF3 in column 1 (the source ID), column 8 (the attributes ID and Target_ID), as well as the FASTA ID for internal sequence. We?re updating the GFF3 spec to clarify this so that everywhere you see the same ID getting treated the same way for character escaping. To be safe though, only use these characters in your contig IDs for the assembly when using any tool that reads or outputs GFF3 ?> a-zA-Z0-9.:^*$@!+_?-| Any character not in that set has a high chance of breaking some downstream tool. For now just assume the strict interpretation from the GFF3 spec for column 1, must be used on all IDs everywhere (see below). >>Column 1: ?seqid" >>The ID of the landmark used to establish the coordinate system for the >>current feature. >>IDs may contain any characters, but must escape any characters not in >>the set [a-zA-Z0-9.:^*$@!+_?-|]. >>In particular, IDs may not contain unescaped whitespace and must not >>begin with an unescaped ">". Thanks, Carson On 3/13/14, 7:35 PM, "dhivya arasappan" wrote: >Hi Carson, > >I used gff3_merge to create my gff file from maker output. I've >attached it here. But when I run maker2zff on it, I get the following >error: > >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/ >2.30/bin/maker2zff line 177, line 7294251. > >It produces an incomplete output file and it looks like it may be >running into problems when it encounters scaffold3%2F0. I'm wondering >if its having problems with my scaffold names. There seem to be some >inconsistencies because it's referred to as scaffold3%F0 and >scaffold3/0 in the gff file. It goes through other scaffolds like >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine. I did try replacing the >scaffold names in the gff file, but still get the same error. Any >ideas? > >Substitution command I used, for your reference: sed 's/3\%2F/3_/g' >gfffile| sed 's/\//\_/' > mod.gfffile > >Thanks >Dhivya > From darasappan at gmail.com Mon Mar 17 15:20:18 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 17 Mar 2014 16:20:18 -0500 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: References: Message-ID: Awesome! Thanks Carson. Dhivya On Mon, Mar 17, 2014 at 4:14 PM, Carson Holt wrote: > Just an update on this. I've fixed the maker2zff script to handle the > issues seen. Looking at this actually brought to light another issue. > There is inconsistent escape character specification for GFF3 in column 1 > (the source ID), column 8 (the attributes ID and Target_ID), as well as > the FASTA ID for internal sequence. We're updating the GFF3 spec to > clarify this so that everywhere you see the same ID getting treated the > same way for character escaping. > > To be safe though, only use these characters in your contig IDs for the > assembly when using any tool that reads or outputs GFF3 --> > a-zA-Z0-9.:^*$@!+_?-| > > Any character not in that set has a high chance of breaking some > downstream tool. For now just assume the strict interpretation from the > GFF3 spec for column 1, must be used on all IDs everywhere (see below). > > >>Column 1: "seqid" > >>The ID of the landmark used to establish the coordinate system for the > >>current feature. > >>IDs may contain any characters, but must escape any characters not in > >>the set [a-zA-Z0-9.:^*$@!+_?-|]. > >>In particular, IDs may not contain unescaped whitespace and must not > >>begin with an unescaped ">". > > > Thanks, > Carson > > > > On 3/13/14, 7:35 PM, "dhivya arasappan" wrote: > > >Hi Carson, > > > >I used gff3_merge to create my gff file from maker output. I've > >attached it here. But when I run maker2zff on it, I get the following > >error: > > > >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/ > >2.30/bin/maker2zff line 177, line 7294251. > > > >It produces an incomplete output file and it looks like it may be > >running into problems when it encounters scaffold3%2F0. I'm wondering > >if its having problems with my scaffold names. There seem to be some > >inconsistencies because it's referred to as scaffold3%F0 and > >scaffold3/0 in the gff file. It goes through other scaffolds like > >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine. I did try replacing the > >scaffold names in the gff file, but still get the same error. Any > >ideas? > > > >Substitution command I used, for your reference: sed 's/3\%2F/3_/g' > >gfffile| sed 's/\//\_/' > mod.gfffile > > > >Thanks > >Dhivya > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at bils.se Tue Mar 18 05:43:43 2014 From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Tue, 18 Mar 2014 12:43:43 +0100 Subject: [maker-devel] Maker changes 2.30-2.31 Message-ID: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> Hi, I have observed a few oddities with our installation of maker 2.31 and was therefore wondering if there is a change log somewhere to get some information on what, if anything, was changed between 2.30 and 2.31? There is of course a good chance that the issues I am seeing (pipeline locking up) are related to our setup and not necessarily Maker - but I?d like to make sure, if possible. Both versions use the exact same external binaries etc, and were run on the same data. 2.30 is running along happily, 2.31 however has randomly locked up. I should perhaps also say that I am running on SL 6.2 and am using mpich2 for the MPI run. I haven?t done any more systematic testing so far, but will probably do so if there is no ?obvious? reason why Maker 2.31 should behave differently.. Cheers, Marc Marc P. Hoeppner, PhD Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at bils.se From carsonhh at gmail.com Tue Mar 18 09:07:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 09:07:07 -0600 Subject: [maker-devel] Maker changes 2.30-2.31 In-Reply-To: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> References: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> Message-ID: Attached. Also make sure you are using the tar ball from the lab website and not the prerelease from the subversion repository. Thanks, Carson On 3/18/14, 5:43 AM, "Marc H?ppner" wrote: >Hi, > >I have observed a few oddities with our installation of maker 2.31 and >was therefore wondering if there is a change log somewhere to get some >information on what, if anything, was changed between 2.30 and 2.31? > >There is of course a good chance that the issues I am seeing (pipeline >locking up) are related to our setup and not necessarily Maker - but I?d >like to make sure, if possible. Both versions use the exact same external >binaries etc, and were run on the same data. 2.30 is running along >happily, 2.31 however has randomly locked up. I should perhaps also say >that I am running on SL 6.2 and am using mpich2 for the MPI run. > >I haven?t done any more systematic testing so far, but will probably do >so if there is no ?obvious? reason why Maker 2.31 should behave >differently.. > >Cheers, > >Marc > > > > >Marc P. Hoeppner, PhD >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at bils.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- r1060 | cholt | 2013-11-04 11:18:12 -0700 (Mon, 04 Nov 2013) | MAKER stable release version 2.30 r1061 | cholt | 2013-11-10 22:19:51 -0700 (Sun, 10 Nov 2013) | altered build install slightly r1062 | cholt | 2013-11-25 09:33:16 -0700 (Mon, 25 Nov 2013) | updated fgenesh for hint based annotation error r1063 | cholt | 2013-12-05 14:10:42 -0700 (Thu, 05 Dec 2013) | fix repeat too short output error r1064 | cholt | 2013-12-05 14:18:04 -0700 (Thu, 05 Dec 2013) | updated installation scripts r1065 | cholt | 2013-12-13 08:42:08 -0700 (Fri, 13 Dec 2013) | fix fully masked failure for BLAST 2.2.25 r1066 | cholt | 2014-01-09 10:45:08 -0700 (Thu, 09 Jan 2014) | update MWAS and maker2jbrowse r1067 | cholt | 2014-01-09 11:34:18 -0700 (Thu, 09 Jan 2014) | fix invalid character in Ecoli example fasta r1068 | cholt | 2014-01-24 10:42:15 -0700 (Fri, 24 Jan 2014) | added iprscan to maker.css for MWAS r1070 | cholt | 2014-01-26 20:27:52 -0700 (Sun, 26 Jan 2014) | attempt to fix ipr_update issues with Name ne to ID and fix lock with GFF3DB as well as docs for JBrowse and MAKER install r1071 | cholt | 2014-01-26 20:41:55 -0700 (Sun, 26 Jan 2014) | alter install to hide MWAS fix skip of small contigs and map forward of genes with est_forward r1072 | cholt | 2014-01-28 11:20:41 -0700 (Tue, 28 Jan 2014) | added message to get user to use the correct maker executable and updated INSTALL docs r1073 | cholt | 2014-01-28 11:36:19 -0700 (Tue, 28 Jan 2014) | further update to maker from wrong directory message when name has whitespace r1074 | cholt | 2014-02-03 14:48:05 -0700 (Mon, 03 Feb 2014) | fixed segfault on exit for OpenMPI r1075 | cholt | 2014-02-03 15:32:38 -0700 (Mon, 03 Feb 2014) | added support for optional test compiler flags to be used with MVAPICH2 r1076 | cholt | 2014-02-03 15:38:52 -0700 (Mon, 03 Feb 2014) | fixed build commit missing m option r1077 | cholt | 2014-02-04 14:29:43 -0700 (Tue, 04 Feb 2014) | made MPI communication always serialize r1078 | cholt | 2014-02-05 11:23:10 -0700 (Wed, 05 Feb 2014) | updated MPI calling to use probe for size rather than another message for faster performance r1079 | cholt | 2014-02-06 08:29:45 -0700 (Thu, 06 Feb 2014) | fixed labeling bug, fixed hanging MPI calls, fixed trnascan introns, and length r1080 | cholt | 2014-02-11 10:08:33 -0700 (Tue, 11 Feb 2014) | switch FindBin::Bin for FindBin::RealBin throughout r1081 | cholt | 2014-02-11 10:49:24 -0700 (Tue, 11 Feb 2014) | MAKER stable release version 2.31 From fbarreto at ucsd.edu Tue Mar 18 10:08:47 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Tue, 18 Mar 2014 09:08:47 -0700 Subject: [maker-devel] Size of initial EST training set for SNAP Message-ID: Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 10:14:29 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 10:14:29 -0600 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: That sounds good. 1,500 initial models should be more than sufficient for the first round of training. ?Carson From: Felipe Barreto Date: Tuesday, March 18, 2014 at 10:08 AM To: MAKER group Subject: [maker-devel] Size of initial EST training set for SNAP Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Mar 18 10:16:20 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 18 Mar 2014 16:16:20 +0000 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: Hi Felipe, I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training. The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence. Good luck and let us know if there's anything we can help with! Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu] Sent: Tuesday, March 18, 2014 10:08 AM To: MAKER group Subject: [maker-devel] Size of initial EST training set for SNAP Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue Mar 18 10:26:45 2014 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 18 Mar 2014 10:26:45 -0600 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> Hi Felipe, I think that plan sounds quite reasonable. To address your primary concern, most gene prediction tools recommend something in the range of a minimum of a few hundred gene models to train on. Since your an order of magnitude above that I think your in good shape. Having said that, of course if you have concerns about biases in your training set you may be able to supplement it further by using a tool like CEGMA (http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence genes that your set is missing. Since the final gene set will only be as complete as the gene predictions that MAKER has to choose from I would suggest that you also consider including at least one other gene predictor. Augustus works well on a wide variety of genomes and while it is more difficult to train than SNAP it does accept hints from MAKER and will likely add to the diversity of the final gene set, even if you choose to use an existing HMM that has some reasonable relationship to your genome. This is one of the advantages of MAKER supervision, while it would be best to train Augustus as well, MAKER will ensure that the final models are not too far out of line with the evidence and you'll likely see quite good results using a custom SNAP HMM and an existing Augustus HMM as predictor within MAKER. Thanks, B On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote: > Hi, all, > > I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. > > From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). > > I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). > > Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. > > Thanks for any insight! > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Tue Mar 18 10:59:39 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Tue, 18 Mar 2014 09:59:39 -0700 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> References: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> Message-ID: Thanks, guys, for the swift and informative response! I will try to train Augustus again, but at the very least, will include it with an arthropod HMM in my final run (in addition to my custom SNAP HMM). Cheers, Felipe On Tue, Mar 18, 2014 at 9:26 AM, Barry Moore wrote: > Hi Felipe, > > I think that plan sounds quite reasonable. To address your primary > concern, most gene prediction tools recommend something in the range of a > minimum of a few hundred gene models to train on. Since your an order of > magnitude above that I think your in good shape. Having said that, of > course if you have concerns about biases in your training set you may be > able to supplement it further by using a tool like CEGMA ( > http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence > genes that your set is missing. > > Since the final gene set will only be as complete as the gene predictions > that MAKER has to choose from I would suggest that you also consider > including at least one other gene predictor. Augustus works well on a wide > variety of genomes and while it is more difficult to train than SNAP it > does accept hints from MAKER and will likely add to the diversity of the > final gene set, even if you choose to use an existing HMM that has some > reasonable relationship to your genome. This is one of the advantages of > MAKER supervision, while it would be best to train Augustus as well, MAKER > will ensure that the final models are not too far out of line with the > evidence and you'll likely see quite good results using a custom SNAP HMM > and an existing Augustus HMM as predictor within MAKER. > > Thanks, > > B > > On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote: > > Hi, all, > > I've been learning a lot from reading posts from this group, and finally > started doing actual runs of Maker on our current genome assembly > (arthropod, genome size ~230Mb). I started by training SNAP, but would > like to check my approach before continuing with longer runs. > > From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I > deemed of very high quality based on blast alignments to Swiss-Prot (based > on query-subject coverage, bit score, etc). I then used only these 2000 > ESTs in a first Maker run using est2genome=1. The output returned 1500 > models (with the 500 "missing" models probably a result of single-exon > issues; not a concern at this point). > > I now plan on training SNAP with this first output, and then doing another > Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein > evidence, and 3) SNAP with the first HMM file. The output of this second > run will be used to re-train SNAP, and this second HMM file will be used in > a final "official" run (while continuing to provide the EST and protein > evidence, of course). > > Does this sound like a reasonable approach? Simply put, my main concern > is whether I'm using too few ESTs in my first est2genome step. > > Thanks for any insight! > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 18 13:27:11 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 18 Mar 2014 14:27:11 -0500 Subject: [maker-devel] maker snap output files Message-ID: Hello, I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. *maker.proteins.fasta *maker.snap_masked.proteins.fasta *maker.non_overlapping_ab_initio.proteins.fasta What is the difference among these? They all have different number of sequences. Similarly,with transcripts: maker.non_overlapping_ab_initio.transcripts.fasta maker.snap_masked.transcripts.fasta maker.transcripts.fasta Thanks Dhivya -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 13:34:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 13:34:05 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: References: Message-ID: maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 18, 2014 at 1:27 PM To: Carson Holt , Subject: maker snap output files Hello, I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. *maker.proteins.fasta *maker.snap_masked.proteins.fasta *maker.non_overlapping_ab_initio.proteins.fasta What is the difference among these? They all have different number of sequences. Similarly,with transcripts: maker.non_overlapping_ab_initio.transcripts.fasta maker.snap_masked.transcripts.fasta maker.transcripts.fasta Thanks Dhivya -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 18 14:05:39 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 18 Mar 2014 15:05:39 -0500 Subject: [maker-devel] maker snap output files In-Reply-To: References: Message-ID: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Thanks Carson. Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? The unfiltered proteins file has more proteins though. Thanks Dhivya On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) > maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) > maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. > > > These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. > > Thanks, > Carson > > > > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 1:27 PM > To: Carson Holt , > Subject: maker snap output files > > Hello, > > I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. > > *maker.proteins.fasta > *maker.snap_masked.proteins.fasta > *maker.non_overlapping_ab_initio.proteins.fasta > > What is the difference among these? They all have different number of sequences. > > Similarly,with transcripts: > > maker.non_overlapping_ab_initio.transcripts.fasta > maker.snap_masked.transcripts.fasta > maker.transcripts.fasta > > Thanks > Dhivya > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 14:09:01 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 14:09:01 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Message-ID: There can also be hint based predictions. They may be similar in size, but there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold). You should always review your annotations in something like Apollo, to see how the models compare to the evidence. Just counts don?t really mean anything. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 18, 2014 at 2:05 PM To: Carson Holt Cc: Subject: Re: maker snap output files Thanks Carson. Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? The unfiltered proteins file has more proteins though. Thanks Dhivya On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > maker.proteins.fasta - these are the final filtered and modified protein > models (this is what you want) > maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio > predictions (for reference purposes) > maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant > rejected models that do not overlap the maker.proteins.fasta entries. If you > think you are missing a gene, look for it here. Sometimes people use > interproscan (very slow) to analyze this file for false negatives. > > > These files are also described in the README distributed with MAKER in the > ?MAKER OUTPUT? section. > > Thanks, > Carson > > > > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 1:27 PM > To: Carson Holt , > Subject: maker snap output files > > Hello, > > I ran maker after running SNAP ab initio prediction (following instructions > from the maker tutorial). It ran successfully and when I ran fasta_merge, I > got several output fasta files. I?m unable to find information on the tutorial > about interpreting these different files. I?m hoping one of you can help. > > *maker.proteins.fasta > *maker.snap_masked.proteins.fasta > *maker.non_overlapping_ab_initio.proteins.fasta > > What is the difference among these? They all have different number of > sequences. > > Similarly,with transcripts: > > maker.non_overlapping_ab_initio.transcripts.fasta > maker.snap_masked.transcripts.fasta > maker.transcripts.fasta > > Thanks > Dhivya > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisbioinfo at gmail.com Wed Mar 19 05:09:57 2014 From: chrisbioinfo at gmail.com (Chris Bioinfo) Date: Wed, 19 Mar 2014 12:09:57 +0100 Subject: [maker-devel] Annotation with maker2 Message-ID: Hello, I'm installing/using maker2 for the first time and I have an error by using it. I certainly missing something, but I don't know what. I compile maker with no error message and I have all these directories after compilation: bin data GMOD INSTALL lib LICENSE MWAS perl README src Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have this error: STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm Can't call method "do" on an undefined value at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm --> rank=NA, hostname=belem ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 ... ideas? Best, Christelle -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 19 07:01:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Mar 2014 07:01:35 -0600 Subject: [maker-devel] Annotation with maker2 In-Reply-To: References: Message-ID: Your problem is one of the following. You need to reinstall the DBD::SQLite module, you are running in a directory you don?t have permissions for, you set your TMDIR environmental variable or TMP value in maker_opts.ctl to an NFS mounted or memory mounted directory, or you are using a self compiled version of Perl (I.e. not /usr/bin/perl) that has issues (probably with DB or SQLite modules). You can also completely delete the output directory, and start again to see if it was just a random error. You should look at each of those first. You can also run MAKER with the --debug command line flag and send it to me if all of those seem not to be the issue. Thanks, Carson From: Chris Bioinfo Date: Wednesday, March 19, 2014 at 5:09 AM To: Subject: [maker-devel] Annotation with maker2 Hello, I'm installing/using maker2 for the first time and I have an error by using it. I certainly missing something, but I don't know what. I compile maker with no error message and I have all these directories after compilation: bin data GMOD INSTALL lib LICENSE MWAS perl README src Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have this error: STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm Can't call method "do" on an undefined value at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm --> rank=NA, hostname=belem ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 ... ideas? Best, Christelle _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbharris at uw.edu Wed Mar 19 19:19:27 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Wed, 19 Mar 2014 18:19:27 -0700 Subject: [maker-devel] tradeoff between run time & file number Message-ID: Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Mar 19 23:43:11 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 20 Mar 2014 05:43:11 +0000 Subject: [maker-devel] tradeoff between run time & file number In-Reply-To: References: Message-ID: Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end. Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp. Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu] Sent: Wednesday, March 19, 2014 7:19 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] tradeoff between run time & file number Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Mar 20 11:22:47 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 20 Mar 2014 12:22:47 -0500 Subject: [maker-devel] maker snap output files In-Reply-To: References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Message-ID: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> Hi Carson, Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file? The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene? so does this mean these are the genes from the snap prediction tool? Thanks dhivya On Mar 18, 2014, at 3:09 PM, Carson Holt wrote: > There can also be hint based predictions. They may be similar in size, but there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold). You should always review your annotations in something like Apollo, to see how the models compare to the evidence. Just counts don?t really mean anything. > > Thanks, > Carson > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 2:05 PM > To: Carson Holt > Cc: > Subject: Re: maker snap output files > > Thanks Carson. > > Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? > > The unfiltered proteins file has more proteins though. > > Thanks > Dhivya > > > > On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > >> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) >> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) >> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. >> >> >> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. >> >> Thanks, >> Carson >> >> >> >> >> From: dhivya arasappan >> Date: Tuesday, March 18, 2014 at 1:27 PM >> To: Carson Holt , >> Subject: maker snap output files >> >> Hello, >> >> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. >> >> *maker.proteins.fasta >> *maker.snap_masked.proteins.fasta >> *maker.non_overlapping_ab_initio.proteins.fasta >> >> What is the difference among these? They all have different number of sequences. >> >> Similarly,with transcripts: >> >> maker.non_overlapping_ab_initio.transcripts.fasta >> maker.snap_masked.transcripts.fasta >> maker.transcripts.fasta >> >> Thanks >> Dhivya >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 20 11:24:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 20 Mar 2014 11:24:41 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> Message-ID: maker transcripts will be the gene/mRNA/exon/CDS features All other transcripts from SNAP etc. will be match/match_part features in the GFF3. When you look at these in something like Apollo, they will be placed in different viewing panels based on their type. Thanks, Carson From: dhivya arasappan Date: Thursday, March 20, 2014 at 11:22 AM To: Carson Holt Cc: Subject: Re: maker snap output files Hi Carson, Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file? The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene? so does this mean these are the genes from the snap prediction tool? Thanks dhivya On Mar 18, 2014, at 3:09 PM, Carson Holt wrote: > There can also be hint based predictions. They may be similar in size, but > there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, > as gene predictors tend to over predict (as much as 10 fold). You should > always review your annotations in something like Apollo, to see how the models > compare to the evidence. Just counts don?t really mean anything. > > Thanks, > Carson > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 2:05 PM > To: Carson Holt > Cc: > Subject: Re: maker snap output files > > Thanks Carson. > > Is it normal that in my maker results after running snap, the number of > proteins (in *maker.proteins.fasta) Is actually less than the number of > proteins in my pre-snap maker results? I assumed that annotations through > alignment+annotation through prediction would equal more annotations? > > The unfiltered proteins file has more proteins though. > > Thanks > Dhivya > > > > On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > >> maker.proteins.fasta - these are the final filtered and modified protein >> models (this is what you want) >> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab >> initio predictions (for reference purposes) >> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant >> rejected models that do not overlap the maker.proteins.fasta entries. If you >> think you are missing a gene, look for it here. Sometimes people use >> interproscan (very slow) to analyze this file for false negatives. >> >> >> These files are also described in the README distributed with MAKER in the >> ?MAKER OUTPUT? section. >> >> Thanks, >> Carson >> >> >> >> >> From: dhivya arasappan >> Date: Tuesday, March 18, 2014 at 1:27 PM >> To: Carson Holt , >> Subject: maker snap output files >> >> Hello, >> >> I ran maker after running SNAP ab initio prediction (following instructions >> from the maker tutorial). It ran successfully and when I ran fasta_merge, I >> got several output fasta files. I?m unable to find information on the >> tutorial about interpreting these different files. I?m hoping one of you can >> help. >> >> *maker.proteins.fasta >> *maker.snap_masked.proteins.fasta >> *maker.non_overlapping_ab_initio.proteins.fasta >> >> What is the difference among these? They all have different number of >> sequences. >> >> Similarly,with transcripts: >> >> maker.non_overlapping_ab_initio.transcripts.fasta >> maker.snap_masked.transcripts.fasta >> maker.transcripts.fasta >> >> Thanks >> Dhivya >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 20 11:53:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 20 Mar 2014 11:53:24 -0600 Subject: [maker-devel] tradeoff between run time & file number In-Reply-To: References: Message-ID: You may also want to try the GFF3 pass_through options. Basically you give your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from your past run by setting the 'pass' options to 1. Then you can run without your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat masker species as well). The values will be passed forward from the GFF3 file into the current run. --Carson From: Daniel Ence Date: Wednesday, March 19, 2014 at 11:43 PM To: Rebecca Harris , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] tradeoff between run time & file number Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end. Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp. Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu] Sent: Wednesday, March 19, 2014 7:19 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] tradeoff between run time & file number Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 21 08:23:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 08:23:18 -0600 Subject: [maker-devel] Annotation with maker2 In-Reply-To: References: Message-ID: Glad it's working. Let us know if anything else comes up. --Carson From: Chris Bioinfo Date: Friday, March 21, 2014 at 4:57 AM To: Carson Holt Subject: Re: [maker-devel] Annotation with maker2 Dear Carson it works!! after many difficults : I have installed sqlite3.8.4.1 yesterday: it was """better"""" (no error message by launching sqlite3). Yet my test.db was not created.. Today I find the trick! the problem was due to my too long path to created the db .. only that... Thanks for your time and you help Carson! All the best, Christelle 2014-03-20 18:21 GMT+01:00 Carson Holt : > Also you can use this command line to test both before and after installing > > perl -MDBI -MDBD::SQLite -e 'print "$DBD::SQLite::sqlite_version\n"; $dbh = > DBI->connect("dbi:SQLite:dbname=/path/from/maker/error/dpp_contig.db","","");' > > Make sure to set /path/from/maker/error/dpp_contig.db to whatever its was in > the error. > > --Carson > > > From: Carson Holt > Date: Thursday, March 20, 2014 at 11:03 AM > To: Chris Bioinfo > > Subject: Re: [maker-devel] Annotation with maker2 > > The failure is in SQLite. So you have to reinstall. I.e. 'force install > DBD::SQLite' in CPAN. Otherwise you are just keeping whatever module is > installed which may have broken C bindings. > > You may also have to install SQLite 3.8.4.1, and then reinstall the perl > modules using the force option to force recompile. > > --Carson > > > > From: Chris Bioinfo > Date: Thursday, March 20, 2014 at 10:57 AM > To: Carson Holt > Subject: Re: [maker-devel] Annotation with maker2 > > cpan[2]> install DBI > DBI is up to date (1.631). > > cpan[3]> install DBD::SQLite > DBD::SQLite is up to date (1.42). > > my test.db is not created effectively: > > sqlite3 dpp_contig.maker.output/test.db > SQLite version 3.8.3.1 2014-02-11 14:52:19 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> > > > > > 2014-03-20 17:36 GMT+01:00 Carson Holt : >> I'm actually checking the mount points for the disk. SQLite won't work on >> filesystems that don't implement locks, and 'df' is a good way to infer some >> of that info. >> >> Basically I still think this is SQLlite failing on your system. You might >> need to reinstall SQLlite and then reinstall the perl DBI and DBD::SQLite >> modules. >> >> You can also do a test command --> 'sqllite3 dpp_contig.maker.output/test.db' >> >> This will work if you have sqllite3 installed. And any error it give may be >> informative. >> >> --Carson >> >> From: Chris Bioinfo >> Date: Thursday, March 20, 2014 at 10:29 AM >> >> To: Carson Holt >> Subject: Re: [maker-devel] Annotation with maker2 >> >> oh sorry >> >> my disks are quite full, but still space I guess for maker >> >> /dev/sdc1 19T 18T 934G 95% /home >> >> >> 2014-03-20 17:23 GMT+01:00 Chris Bioinfo : >>> this : >>> >>> du -h dpp_contig.maker.output/ >>> 0 >>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi >>> d.contig-dpp-500-500/0 >>> 88K >>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi >>> d.contig-dpp-500-500 >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500 >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05/1F >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05 >>> 92K dpp_contig.maker.output/dpp_contig_datastore >>> 4.0K dpp_contig.maker.output/dpp_contig_master_datastore_index.log >>> 4.0K dpp_contig.maker.output/maker_bopts.log >>> 4.0K dpp_contig.maker.output/maker_exe.log >>> 8.0K dpp_contig.maker.output/maker_opts.log >>> 16K dpp_contig.maker.output/mpi_blastdb/dpp_protein%2Efasta.mpi.1 >>> 44K dpp_contig.maker.output/mpi_blastdb/dpp_contig%2Efasta.mpi.1 >>> 14M dpp_contig.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10 >>> 32K dpp_contig.maker.output/mpi_blastdb/dpp_est%2Efasta.mpi.1 >>> 14M dpp_contig.maker.output/mpi_blastdb >>> 0 dpp_contig.maker.output/seen.dbm >>> >>> >>> >>> 2014-03-20 17:10 GMT+01:00 Carson Holt : >>> >>>> What does 'df -h dpp_contig.maker.output' show? >>>> >>>> --Carson >>>> >>>> From: Chris Bioinfo >>>> Date: Thursday, March 20, 2014 at 10:00 AM >>>> >>>> To: Carson Holt >>>> Subject: Re: [maker-devel] Annotation with maker2 >>>> >>>> sorry, mistake on the dir! >>>> >>>> I have these files: >>>> dpp_contig_datastore dpp_contig_master_datastore_index.log >>>> maker_bopts.log maker_exe.log maker_opts.log mpi_blastdb seen.dbm >>>> >>>> >>>> 2014-03-20 16:59 GMT+01:00 Chris Bioinfo : >>>>> no, >>>>> >>>>> I have theses files in the directory: >>>>> dpp_contig.fasta dpp_est.fasta hsap_contig.fasta >>>>> hsap_protein.fasta maker_exe.ctl >>>>> dpp_contig.maker.output dpp_protein.fasta hsap_est.fasta >>>>> maker_bopts.ctl maker_opts.ctl te_proteins.fasta >>>>> >>>>> >>>>> >>>>> 2014-03-20 16:53 GMT+01:00 Carson Holt : >>>>> >>>>>> Did >>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker. >>>>>> output/dpp_contig.db exist? >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> From: Chris Bioinfo >>>>>> Date: Thursday, March 20, 2014 at 9:50 AM >>>>>> >>>>>> To: Carson Holt >>>>>> Subject: Re: [maker-devel] Annotation with maker2 >>>>>> >>>>>> cdantec at belem:~$ /usr/bin/perl -v >>>>>> >>>>>> This is perl 5, version 18, subversion 1 (v5.18.1) built for >>>>>> x86_64-linux-gnu-thread-multi >>>>>> (with 46 registered patches, see perl -V for more detail) >>>>>> >>>>>> Copyright 1987-2013, Larry Wall >>>>>> >>>>>> Perl may be copied only under the terms of either the Artistic License or >>>>>> the >>>>>> GNU General Public License, which may be found in the Perl 5 source kit. >>>>>> >>>>>> Complete documentation for Perl, including FAQ lists, should be found on >>>>>> this system using "man perl" or "perldoc perl". If you have access to >>>>>> the >>>>>> Internet, point your browser at http://www.perl.org/, the Perl Home Page. >>>>>> >>>>>> >>>>>> >>>>>> 2014-03-20 16:32 GMT+01:00 Carson Holt : >>>>>>> What do you get for when you type --> /usr/bin/perl -v >>>>>>> >>>>>>> The key to the error is this line --> >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file >>>>>>> >>>>>>> Either the database doesn't exist, or is corrupt. Does it exist? >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> From: Chris Bioinfo >>>>>>> Date: Thursday, March 20, 2014 at 9:25 AM >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Annotation with maker2 >>>>>>> >>>>>>> Dear Carson, >>>>>>> >>>>>>> I have reinstalled DBD::SQLite module, check the permission in my >>>>>>> directory, configure the TMP value in maker_opts.ctl. perl is in >>>>>>> /usr/bin/perl. >>>>>>> I have deleted many times the output directory.. but same problem.. >>>>>>> >>>>>>> So here the debug output : >>>>>>> ****MODULE VERSION INFO >>>>>>> 0.05 Acme::Damn /usr/local/lib/perl/5.18.1/Acme/Damn.pm >>>>>>> 1.01 AnyDBM_File /usr/share/perl/5.18/AnyDBM_File.pm >>>>>>> 5.73 AutoLoader /usr/share/perl/5.18/AutoLoader.pm >>>>>>> UNKNOWN Bio::AnalysisParserI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnalysisParserI.pm >>>>>>> UNKNOWN Bio::AnnotatableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotatableI.pm >>>>>>> UNKNOWN Bio::Annotation::Collection >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/Collection.pm >>>>>>> UNKNOWN Bio::Annotation::SimpleValue >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/SimpleValue.pm >>>>>>> UNKNOWN Bio::Annotation::TypeManager >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/TypeManager.pm >>>>>>> UNKNOWN Bio::AnnotationCollectionI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationCollectionI.pm >>>>>>> UNKNOWN Bio::AnnotationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationI.pm >>>>>>> 1.006923 Bio::DB::Fasta >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/Fasta.pm >>>>>>> UNKNOWN Bio::DB::InMemoryCache >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/InMemoryCache.pm >>>>>>> UNKNOWN Bio::DB::IndexedBase >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/IndexedBase.pm >>>>>>> UNKNOWN Bio::DB::RandomAccessI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/RandomAccessI.pm >>>>>>> UNKNOWN Bio::DB::SeqI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/SeqI.pm >>>>>>> UNKNOWN Bio::DescribableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DescribableI.pm >>>>>>> UNKNOWN Bio::Event::EventGeneratorI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventGeneratorI.pm >>>>>>> UNKNOWN Bio::Event::EventHandlerI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventHandlerI.pm >>>>>>> UNKNOWN Bio::Factory::ObjectFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactory.pm >>>>>>> UNKNOWN Bio::Factory::ObjectFactoryI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactoryI.pm >>>>>>> UNKNOWN Bio::Factory::SequenceFactoryI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/SequenceFactoryI.pm >>>>>>> UNKNOWN Bio::FeatureHolderI >>>>>>> /usr/local/share/perl/5.18.1/Bio/FeatureHolderI.pm >>>>>>> UNKNOWN Bio::IdentifiableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/IdentifiableI.pm >>>>>>> UNKNOWN Bio::LocatableSeq >>>>>>> /usr/local/share/perl/5.18.1/Bio/LocatableSeq.pm >>>>>>> UNKNOWN Bio::Location::Atomic >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Atomic.pm >>>>>>> UNKNOWN Bio::Location::CoordinatePolicyI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/CoordinatePolicyI.pm >>>>>>> UNKNOWN Bio::Location::Fuzzy >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Fuzzy.pm >>>>>>> UNKNOWN Bio::Location::FuzzyLocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/FuzzyLocationI.pm >>>>>>> UNKNOWN Bio::Location::Simple >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Simple.pm >>>>>>> UNKNOWN Bio::Location::Split >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Split.pm >>>>>>> UNKNOWN Bio::Location::SplitLocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/SplitLocationI.pm >>>>>>> UNKNOWN Bio::Location::WidestCoordPolicy >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/WidestCoordPolicy.pm >>>>>>> UNKNOWN Bio::LocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/LocationI.pm >>>>>>> UNKNOWN Bio::PrimarySeq >>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeq.pm >>>>>>> 1.006923 Bio::PrimarySeqI >>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeqI.pm >>>>>>> UNKNOWN Bio::Range /usr/local/share/perl/5.18.1/Bio/Range.pm >>>>>>> UNKNOWN Bio::RangeI /usr/local/share/perl/5.18.1/Bio/RangeI.pm >>>>>>> 1.006923 Bio::Root::Exception >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Exception.pm >>>>>>> UNKNOWN Bio::Root::HTTPget >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/HTTPget.pm >>>>>>> UNKNOWN Bio::Root::IO >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/IO.pm >>>>>>> 1.006923 Bio::Root::Root >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Root.pm >>>>>>> 1.006923 Bio::Root::RootI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/RootI.pm >>>>>>> 1.006923 Bio::Root::Version >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Version.pm >>>>>>> UNKNOWN Bio::Search::HSP::GenericHSP >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/GenericHSP.pm >>>>>>> UNKNOWN Bio::Search::HSP::HSPFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPFactory.pm >>>>>>> UNKNOWN Bio::Search::HSP::HSPI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPI.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/Base.p>>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/august >>>>>>> us.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastn >>>>>>> .pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastx >>>>>>> .pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/cdna2g >>>>>>> enome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/est2ge >>>>>>> nome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/fgenes >>>>>>> h.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/genema >>>>>>> rk.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::gff3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/gff3.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/protei >>>>>>> n2genome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::repeatmasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/repeat >>>>>>> masker.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snap.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snosca >>>>>>> n.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/tblast >>>>>>> x.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/trnasc >>>>>>> an.pm >>>>>>> 1.006923 Bio::Search::Hit::GenericHit >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/GenericHit.pm >>>>>>> UNKNOWN Bio::Search::Hit::HitFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitFactory.pm >>>>>>> UNKNOWN Bio::Search::Hit::HitI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitI.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/Base.p>>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/august >>>>>>> us.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastn >>>>>>> .pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastx >>>>>>> .pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/cdna2g >>>>>>> enome.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/est2ge >>>>>>> nome.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/fgenes >>>>>>> h.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/genema >>>>>>> rk.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::gff3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/gff3.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/protei >>>>>>> n2genome.pm >>>>>>> 1.006923 Bio::Search::Hit::PhatHit::repeatmasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/repeat >>>>>>> masker.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snap.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snosca >>>>>>> n.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/tblast >>>>>>> x.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/trnasc >>>>>>> an.pm >>>>>>> 1.006923 Bio::Search::SearchUtils >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/SearchUtils.pm >>>>>>> UNKNOWN Bio::SearchIO >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO.pm >>>>>>> UNKNOWN Bio::SearchIO::EventHandlerI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/EventHandlerI.pm >>>>>>> UNKNOWN Bio::SearchIO::SearchResultEventBuilder >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/SearchResultEventBuilder.pm >>>>>>> UNKNOWN Bio::Seq /usr/local/share/perl/5.18.1/Bio/Seq.pm >>>>>>> UNKNOWN Bio::Seq::SeqFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Seq/SeqFactory.pm >>>>>>> UNKNOWN Bio::SeqAnalysisParserI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqAnalysisParserI.pm >>>>>>> UNKNOWN Bio::SeqFeature::FeaturePair >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/FeaturePair.pm >>>>>>> UNKNOWN Bio::SeqFeature::Generic >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Generic.pm >>>>>>> UNKNOWN Bio::SeqFeature::Similarity >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Similarity.pm >>>>>>> UNKNOWN Bio::SeqFeature::SimilarityPair >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/SimilarityPair.pm >>>>>>> UNKNOWN Bio::SeqFeatureI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeatureI.pm >>>>>>> UNKNOWN Bio::SeqI /usr/local/share/perl/5.18.1/Bio/SeqI.pm >>>>>>> UNKNOWN Bio::SeqUtils >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqUtils.pm >>>>>>> 1.006923 Bio::Tools::CodonTable >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/CodonTable.pm >>>>>>> UNKNOWN Bio::Tools::GFF >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/GFF.pm >>>>>>> 1.006923 Bio::Tools::IUPAC >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/IUPAC.pm >>>>>>> 7.3 Bit::Vector /usr/local/lib/perl/5.18.1/Bit/Vector.pm >>>>>>> 0.01 CGL::Annotation >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation.pm >>>>>>> 0.01 CGL::Annotation::Feature >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature.pm >>>>>>> 0.01 CGL::Annotation::Feature::Contig >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Contig >>>>>>> .pm >>>>>>> 0.01 CGL::Annotation::Feature::Exon >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Exon.p>>>>>>> m >>>>>>> 0.01 CGL::Annotation::Feature::Gene >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Gene.p>>>>>>> m >>>>>>> 0.01 CGL::Annotation::Feature::Intron >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Intron >>>>>>> .pm >>>>>>> 0.01 CGL::Annotation::Feature::Protein >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Protei >>>>>>> n.pm >>>>>>> 0.01 CGL::Annotation::Feature::Sequence_variant >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Sequen >>>>>>> ce_variant.pm >>>>>>> 0.01 CGL::Annotation::Feature::Transcript >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Transc >>>>>>> ript.pm >>>>>>> 0.01 CGL::Annotation::FeatureLocation >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureLocatio >>>>>>> n.pm >>>>>>> 0.01 CGL::Annotation::FeatureRelationship >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureRelatio >>>>>>> nship.pm >>>>>>> 0.01 CGL::Annotation::Iterator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Iterator.pm >>>>>>> 0.01 CGL::Annotation::Trace >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Trace.pm >>>>>>> 0.01 CGL::Clone >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Clone.pm >>>>>>> 0.01 CGL::Ontology::Node >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Node.pm >>>>>>> 0.01 CGL::Ontology::NodeRelationship >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/NodeRelationship >>>>>>> .pm >>>>>>> 0.01 CGL::Ontology::Ontology >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Ontology.pm >>>>>>> 0.01 CGL::Ontology::Parser::OBO >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Parser/OBO.pm >>>>>>> 0.01 CGL::Ontology::SO >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/SO.pm >>>>>>> 0.01 CGL::Ontology::Trace >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Trace.pm >>>>>>> 0.01 CGL::Revcomp >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Revcomp.pm >>>>>>> 0.01 CGL::TranslationMachine >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/TranslationMachine.pm >>>>>>> 1.32 Carp /usr/local/share/perl/5.18.1/Carp.pm >>>>>>> 1.32 Carp::Heavy /usr/local/share/perl/5.18.1/Carp/Heavy.pm >>>>>>> 0.64 Class::Struct /usr/share/perl/5.18/Class/Struct.pm >>>>>>> 0.36 Clone /usr/local/lib/perl/5.18.1/Clone.pm >>>>>>> 5.018001 Config /usr/lib/perl/5.18/Config.pm >>>>>>> 3.40 Cwd /usr/lib/perl/5.18/Cwd.pm >>>>>>> 1.42 DBD::SQLite /usr/local/lib/perl/5.18.1/DBD/SQLite.pm >>>>>>> 1.631 DBI /usr/local/lib/perl/5.18.1/DBI.pm >>>>>>> 1.827 DB_File /usr/lib/perl/5.18/DB_File.pm >>>>>>> 2.145 Data::Dumper /usr/lib/perl/5.18/Data/Dumper.pm >>>>>>> 0.11 Datastore::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/Base.pm >>>>>>> 0.01 Datastore::MD5 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/MD5.pm >>>>>>> 2.53 Digest::MD5 /usr/local/lib/perl/5.18.1/Digest/MD5.pm >>>>>>> 1.16 Digest::base /usr/share/perl/5.18/Digest/base.pm >>>>>>> >>>>>>> UNKNOWN Dumper::GFF::GFFV3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/GFF/GFFV3.pm >>>>>>> UNKNOWN Dumper::XML::Game >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game.pm >>>>>>> UNKNOWN Dumper::XML::Game_Xml >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game_Xml.pm >>>>>>> 1.18 DynaLoader /usr/lib/perl/5.18/DynaLoader.pm >>>>>>> 1.18 Errno /usr/lib/perl/5.18/Errno.pm >>>>>>> 0.17015 Error >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm >>>>>>> UNKNOWN Error::Simple >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error/Simple.pm >>>>>>> 5.68 Exporter /usr/share/perl/5.18/Exporter.pm >>>>>>> 5.68 Exporter::Heavy /usr/share/perl/5.18/Exporter/Heavy.pm >>>>>>> UNKNOWN Fasta >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Fasta.pm >>>>>>> UNKNOWN FastaChunk >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunk.pm >>>>>>> UNKNOWN FastaChunker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunker.pm >>>>>>> UNKNOWN FastaDB >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaDB.pm >>>>>>> UNKNOWN FastaFile >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaFile.pm >>>>>>> UNKNOWN FastaSeq >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaSeq.pm >>>>>>> 1.11 Fcntl /usr/lib/perl/5.18/Fcntl.pm >>>>>>> 2.84 File::Basename /usr/share/perl/5.18/File/Basename.pm >>>>>>> 2.26 File::Copy /usr/share/perl/5.18/File/Copy.pm >>>>>>> 1.20 File::Glob /usr/lib/perl/5.18/File/Glob.pm >>>>>>> 1.20 File::NFSLock >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/NFSLock.pm >>>>>>> 2.09 File::Path /usr/share/perl/5.18/File/Path.pm >>>>>>> 3.40 File::Spec /usr/lib/perl/5.18/File/Spec.pm >>>>>>> 3.40 File::Spec::Unix /usr/lib/perl/5.18/File/Spec/Unix.pm >>>>>>> 0.2304 File::Temp /usr/local/share/perl/5.18.1/File/Temp.pm >>>>>>> 1.09 File::Which >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/Which.pm >>>>>>> 2.02 FileHandle /usr/share/perl/5.18/FileHandle.pm >>>>>>> 1.51 FindBin /usr/share/perl/5.18/FindBin.pm >>>>>>> UNKNOWN GFFDB >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> UNKNOWN GI /usr/local/annotation/maker2.31/bin/../lib/GI.pm >>>>>>> 2.42 Getopt::Long /usr/local/share/perl/5.18.1/Getopt/Long.pm >>>>>>> 6.02 HTTP::Date /usr/share/perl5/HTTP/Date.pm >>>>>>> 6.05 HTTP::Headers /usr/share/perl5/HTTP/Headers.pm >>>>>>> 6.06 HTTP::Message /usr/share/perl5/HTTP/Message.pm >>>>>>> 6.00 HTTP::Request /usr/share/perl5/HTTP/Request.pm >>>>>>> 6.04 HTTP::Response /usr/share/perl5/HTTP/Response.pm >>>>>>> 6.03 HTTP::Status /usr/share/perl5/HTTP/Status.pm >>>>>>> 1.28 IO /usr/lib/perl/5.18/IO.pm >>>>>>> 1.16 IO::File /usr/lib/perl/5.18/IO/File.pm >>>>>>> 1.34 IO::Handle /usr/lib/perl/5.18/IO/Handle.pm >>>>>>> 1.1 IO::Seekable /usr/lib/perl/5.18/IO/Seekable.pm >>>>>>> 1.21 IO::Select /usr/lib/perl/5.18/IO/Select.pm >>>>>>> 1.36 IO::Socket /usr/lib/perl/5.18/IO/Socket.pm >>>>>>> 1.33 IO::Socket::INET /usr/lib/perl/5.18/IO/Socket/INET.pm >>>>>>> 1.24 IO::Socket::UNIX /usr/lib/perl/5.18/IO/Socket/UNIX.pm >>>>>>> 1.13 IPC::Open3 /usr/share/perl/5.18/IPC/Open3.pm >>>>>>> 0.53 Inline /usr/local/share/perl/5.18.1/Inline.pm >>>>>>> UNKNOWN Inline::denter >>>>>>> /usr/local/share/perl/5.18.1/Inline/denter.pm >>>>>>> UNKNOWN Iterator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator.pm >>>>>>> UNKNOWN Iterator::Any >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Any.pm >>>>>>> UNKNOWN Iterator::Fasta >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Fasta.pm >>>>>>> UNKNOWN Iterator::GFF3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/GFF3.pm >>>>>>> 6.05 LWP /usr/share/perl5/LWP.pm >>>>>>> UNKNOWN LWP::MemberMixin /usr/share/perl5/LWP/MemberMixin.pm >>>>>>> 6.00 LWP::Protocol /usr/share/perl5/LWP/Protocol.pm >>>>>>> 6.05 LWP::UserAgent /usr/share/perl5/LWP/UserAgent.pm >>>>>>> 0.33 List::MoreUtils >>>>>>> /usr/local/lib/perl/5.18.1/List/MoreUtils.pm >>>>>>> 1.38 List::Util /usr/local/lib/perl/5.18.1/List/Util.pm >>>>>>> UNKNOWN MAKER::ConfigData >>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/MAKER/ConfigData.pm >>>>>>> 1.32 POSIX /usr/lib/perl/5.18/POSIX.pm >>>>>>> 0.01 Parallel::Application::MPI >>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/Parallel/Application/MPI >>>>>>> .pm >>>>>>> 0.02 Perl::Unsafe::Signals >>>>>>> /usr/local/lib/perl/5.18.1/Perl/Unsafe/Signals.pm >>>>>>> UNKNOWN PhatHit_utils >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PhatHit_utils.pm >>>>>>> UNKNOWN PostData >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PostData.pm >>>>>>> 1.0 Proc::ProcessTable_simple >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/ProcessTable_simple.pm >>>>>>> 1.0 Proc::Signal >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/Signal.pm >>>>>>> UNKNOWN Process::MpiChunk >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm >>>>>>> UNKNOWN Process::MpiTiers >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiTiers.pm >>>>>>> 1.38 Scalar::Util /usr/local/lib/perl/5.18.1/Scalar/Util.pm >>>>>>> 1.02 SelectSaver /usr/share/perl/5.18/SelectSaver.pm >>>>>>> UNKNOWN Shadower >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Shadower.pm >>>>>>> UNKNOWN SimpleCluster >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SimpleCluster.pm >>>>>>> 2.009 Socket /usr/lib/perl/5.18/Socket.pm >>>>>>> UNKNOWN SpaceBase >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SpaceBase.pm >>>>>>> 2.45 Storable /usr/local/lib/perl/5.18.1/Storable.pm >>>>>>> 1.07 Symbol /usr/share/perl/5.18/Symbol.pm >>>>>>> 1.17 Sys::Hostname /usr/lib/perl/5.18/Sys/Hostname.pm >>>>>>> 0.21 Sys::SigAction >>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction.pm >>>>>>> UNKNOWN Sys::SigAction::Alarm >>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction/Alarm.pm >>>>>>> 4.02 Term::ANSIColor /usr/share/perl/5.18/Term/ANSIColor.pm >>>>>>> 4.2 Tie::Handle /usr/share/perl/5.18/Tie/Handle.pm >>>>>>> 1.04 Tie::Hash /usr/share/perl/5.18/Tie/Hash.pm >>>>>>> 4.3 Tie::StdHandle /usr/share/perl/5.18/Tie/StdHandle.pm >>>>>>> 1.9726 Time::HiRes /usr/local/lib/perl/5.18.1/Time/HiRes.pm >>>>>>> 1.2300 Time::Local /usr/share/perl/5.18/Time/Local.pm >>>>>>> 1.60 URI /usr/share/perl5/URI.pm >>>>>>> 3.31 URI::Escape /usr/share/perl5/URI/Escape.pm >>>>>>> UNKNOWN Widget >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget.pm >>>>>>> UNKNOWN Widget::RepeatMasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/RepeatMasker.pm >>>>>>> UNKNOWN Widget::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/augustus.pm >>>>>>> >>>>>>> UNKNOWN Widget::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastn.pm >>>>>>> >>>>>>> UNKNOWN Widget::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastx.pm >>>>>>> >>>>>>> UNKNOWN Widget::exonerate >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate.pm >>>>>>> >>>>>>> UNKNOWN Widget::exonerate::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/cdna2genome. >>>>>>> pm >>>>>>> UNKNOWN Widget::exonerate::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/est2genome.p >>>>>>> m >>>>>>> UNKNOWN Widget::exonerate::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/protein2geno >>>>>>> me.pm >>>>>>> UNKNOWN Widget::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/fgenesh.pm >>>>>>> >>>>>>> UNKNOWN Widget::formater >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/formater.pm >>>>>>> >>>>>>> UNKNOWN Widget::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/genemark.pm >>>>>>> >>>>>>> UNKNOWN Widget::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snap.pm >>>>>>> >>>>>>> UNKNOWN Widget::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snoscan.pm >>>>>>> >>>>>>> UNKNOWN Widget::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/tblastx.pm >>>>>>> >>>>>>> UNKNOWN Widget::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/trnascan.pm >>>>>>> >>>>>>> 0.16 XSLoader /usr/share/perl/5.18/XSLoader.pm >>>>>>> 0.21 attributes /usr/lib/perl/5.18/attributes.pm >>>>>>> >>>>>>> 2.18 base /usr/share/perl/5.18/base.pm >>>>>>> 1.04 bytes /usr/share/perl/5.18/bytes.pm >>>>>>> UNKNOWN clean >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/clean.pm >>>>>>> UNKNOWN cluster >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/cluster.pm >>>>>>> >>>>>>> UNKNOWN compare >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/compare.pm >>>>>>> >>>>>>> 1.27 constant /usr/share/perl/5.18/constant.pm >>>>>>> >>>>>>> UNKNOWN ds_utility >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/ds_utility.pm >>>>>>> >>>>>>> UNKNOWN exonerate::splice_info >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/exonerate/splice_info.pm >>>>>>> >>>>>>> 0.34 forks /usr/local/lib/perl/5.18.1/forks.pm >>>>>>> >>>>>>> 2.08001 forks::Devel::Symdump >>>>>>> /usr/local/lib/perl/5.18.1/forks/Devel/Symdump.pm >>>>>>> 0.34 forks::shared /usr/local/lib/perl/5.18.1/forks/shared.pm >>>>>>> >>>>>>> 0.34 forks::signals >>>>>>> /usr/local/lib/perl/5.18.1/forks/signals.pm >>>>>>> 1.00 integer /usr/share/perl/5.18/integer.pm >>>>>>> >>>>>>> 0.63 lib /usr/lib/perl/5.18/lib.pm >>>>>>> 1.02 locale /usr/share/perl/5.18/locale.pm >>>>>>> UNKNOWN maker::auto_annotator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/auto_annotator.pm >>>>>>> >>>>>>> UNKNOWN maker::join >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/join.pm >>>>>>> >>>>>>> UNKNOWN maker::quality_index >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/quality_index.pm >>>>>>> >>>>>>> UNKNOWN maker::sens_spec >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/sens_spec.pm >>>>>>> >>>>>>> 1.22 overload /usr/share/perl/5.18/overload.pm >>>>>>> >>>>>>> 0.02 overloading /usr/share/perl/5.18/overloading.pm >>>>>>> >>>>>>> 0.225 parent /usr/share/perl/5.18/parent.pm >>>>>>> UNKNOWN polisher >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::altest >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/altest.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::est >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/est.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::protein >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/protein.pm >>>>>>> >>>>>>> UNKNOWN repeat_mask_seq >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/repeat_mask_seq.pm >>>>>>> >>>>>>> 0.1 runlog >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/runlog.pm >>>>>>> UNKNOWN shadow_AED >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/shadow_AED.pm >>>>>>> 1.07 sigtrap /usr/share/perl/5.18/sigtrap.pm >>>>>>> >>>>>>> 1.07 strict /usr/share/perl/5.18/strict.pm >>>>>>> 1.77 threads /usr/local/lib/perl/5.18.1/forks.pm >>>>>>> >>>>>>> 1.33 threads::shared >>>>>>> /usr/local/lib/perl/5.18.1/forks/shared.pm >>>>>>> 1.03 vars /usr/share/perl/5.18/vars.pm >>>>>>> 1.18 warnings /usr/share/perl/5.18/warnings.pm >>>>>>> >>>>>>> 1.02 warnings::register >>>>>>> /usr/share/perl/5.18/warnings/register.pm >>>>>>> STATUS: Parsing control files... >>>>>>> Calling GI::load_control_files at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 452. >>>>>>> Calling GI::new_instance_temp at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 463. >>>>>>> Calling GI::mount_check at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 465. >>>>>>> Calling GI::set_global_temp at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 483. >>>>>>> STATUS: Processing and indexing input FASTA files... >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling List::Util::shuffle at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 529. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::create_blastdb at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 574. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 622. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 623. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> STATUS: Setting up database for any GFF3 input... >>>>>>> Calling GFFDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 629. >>>>>>> Calling GFFDB::next_build at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 631. >>>>>>> Calling ds_utility::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 635. >>>>>>> A data structure will be created for you at: >>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker >>>>>>> .output/dpp_contig_datastore >>>>>>> >>>>>>> To access files for individual sequences use the datastore index: >>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker >>>>>>> .output/dpp_contig_master_datastore_index.log >>>>>>> >>>>>>> Calling Datastore::MD5::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 636. >>>>>>> Calling Iterator::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 639. >>>>>>> Calling Iterator::Fasta::skip_file at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 641. >>>>>>> Calling Iterator::Fasta::step at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 643. >>>>>>> STATUS: Now running MAKER... >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now starting the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Calling FastaDB::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> line 107. >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108. >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> Processing run.log file... >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now retrying the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> Tries: 2!! >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Calling FastaDB::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> line 107. >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108. >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> Processing run.log file... >>>>>>> >>>>>>> >>>>>>> Maker is now finished!!! >>>>>>> >>>>>>> Many thanks for you help >>>>>>> >>>>>>> Christelle >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-03-19 14:01 GMT+01:00 Carson Holt : >>>>>>> Your problem is one of the following. You need to reinstall the >>>>>>> DBD::SQLite module, you are running in a directory you don?t have >>>>>>> permissions for, you set your TMDIR environmental variable or TMP value >>>>>>> in maker_opts.ctl to an NFS mounted or memory mounted directory, or you >>>>>>> are using a self compiled version of Perl (I.e. not /usr/bin/perl) that >>>>>>> has issues (probably with DB or SQLite modules). You can also >>>>>>> completely delete the output directory, and start again to see if it was >>>>>>> just a random error. You should look at each of those first. You can >>>>>>> also run MAKER with the --debug command line flag and send it to me if >>>>>>> all of those seem not to be the issue. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> From: Chris Bioinfo >>>>>>> Date: Wednesday, March 19, 2014 at 5:09 AM >>>>>>> To: >>>>>>> Subject: [maker-devel] Annotation with maker2 >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I'm installing/using maker2 for the first time and I have an error by >>>>>>> using it. >>>>>>> >>>>>>> I certainly missing something, but I don't know what. >>>>>>> >>>>>>> I compile maker with no error message and I have all these directories >>>>>>> after compilation: >>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README src >>>>>>> >>>>>>> Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I >>>>>>> have this error: >>>>>>> >>>>>>> STATUS: Now running MAKER... >>>>>>> examining contents of the fasta file and run log >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now starting the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) >>>>>>> failed: unable to open database file at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> ... >>>>>>> >>>>>>> ideas? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Christelle >>>>>>> >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 21 09:43:59 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 21 Mar 2014 08:43:59 -0700 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: Hi, I just wanted to say thanks for all your help- I did the reciprocal best blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to associate names between strain assemblies/annotations. Worked perfectly! -Janna On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt wrote: > maker_map_ids does a translation (i.e. change gene-A to smug1), so you > need to know which genes you want to translate names to (two column input > file, column 1 -> original ID, column 2 -> new ID). I'm not sure EST > forward is the best way to do this, although I do think maker_map_ids is > the tool to use in the end. The question is how to make a list of IDs to > translate as the input to maker_map_ids? > > I would actually just use BLASTP against the reference strain, and then > do reciprocal best BLAST hits. To do this you BLAST your reference > proteins against your maker proteins. Then do the opposite, BLAST your > maker proteins against your reference proteins. If they are both each > others best hit, then they are orthologous, and you can safely make a two > column entry for the maker_map_ids input (i.e. maker-gene-1 translates into > smug1). > > --Carson > > > From: Daniel Ence > Date: Friday, March 14, 2014 at 11:32 AM > To: Janna Fierst , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] associating gene names between related strains > > Hi Janna, So do you have one strain that you want to use as the reference > for all the others? There's a script that comes with MAKER called > maker_map_ids that lets you use a common prefix or suffix for entries in a > fasta file from one strain and then use est_forward to use that ID in the > gene models for the other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > Janna Fierst [jfierst at uoregon.edu] > *Sent:* Friday, March 14, 2014 10:06 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate > the gene naming so that orthologs between species can be associated by > name. I have been playing around a little with the est_forward option but > can't figure out a good system/workflow that preserves names but still uses > the strain-specific RNA-Seq EST set for the actual gene models. Thanks! > -Janna > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 21 09:54:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 09:54:15 -0600 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: I'm glad we could help. --Carson From: Janna Fierst Date: Friday, March 21, 2014 at 9:43 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] associating gene names between related strains Hi, I just wanted to say thanks for all your help- I did the reciprocal best blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to associate names between strain assemblies/annotations. Worked perfectly! -Janna On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt wrote: > maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to > know which genes you want to translate names to (two column input file, column > 1 -> original ID, column 2 -> new ID). I?m not sure EST forward is the best > way to do this, although I do think maker_map_ids is the tool to use in the > end. The question is how to make a list of IDs to translate as the input to > maker_map_ids? > > I would actually just use BLASTP against the reference strain, and then do > reciprocal best BLAST hits. To do this you BLAST your reference proteins > against your maker proteins. Then do the opposite, BLAST your maker proteins > against your reference proteins. If they are both each others best hit, then > they are orthologous, and you can safely make a two column entry for the > maker_map_ids input (i.e. maker-gene-1 translates into smug1). > > ?Carson > > > From: Daniel Ence > Date: Friday, March 14, 2014 at 11:32 AM > To: Janna Fierst , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] associating gene names between related strains > > Hi Janna, So do you have one strain that you want to use as the reference for > all the others? There's a script that comes with MAKER called maker_map_ids > that lets you use a common prefix or suffix for entries in a fasta file from > one strain and then use est_forward to use that ID in the gene models for the > other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna > Fierst [jfierst at uoregon.edu] > Sent: Friday, March 14, 2014 10:06 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate the > gene naming so that orthologs between species can be associated by name. I > have been playing around a little with the est_forward option but can't figure > out a good system/workflow that preserves names but still uses the > strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hossein.Borhan at AGR.GC.CA Fri Mar 21 10:41:38 2014 From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein) Date: Fri, 21 Mar 2014 16:41:38 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: Dear Carson I ran maker and modified .pm files and it resolved the problem with the fasta output. Thanks a lot for your help. HB On 14-03-17 1:45 PM, "Carson Holt" wrote: >I have attached 4 files for you to place in the .../maker/Widgets/ >directory. > >The *blast.pm files will suppress the BLAST+ failures you are getting >(alternatively you can just downgrade to BLAST 2.27 to get the same >effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. >In the latest release NCBI redid all their warnings and error codes so it >spits out a lot of garbage and fails with different messages than it did >before. For example BLAST now warns you every time it encounter a fasta >header with a comment (virtually every fasta entry in existence falls in >this category), so your screen will be awash with meaningless warning >messages. > >The fgenesh.pm file will fix the other failure, which only occurs if you >use fgenesh simultaneously with the est_fustion=1 option. No other >predictors are affected. > >Thanks, >Carson > > >On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: > >>Dear Carson >> >>Sorry for the late reply. I was away for a couple of days. I have >>uploaded >>the out put files plus control and error output on the FTP site that you >>provided >>The user ID is borhanh >> >>I used blast+ for this run. >> >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >>On 14-03-13 10:00 AM, "Carson Holt" >>wrote: >> >>>Just resending this to the correct maker-devel address. Please when >>>replying, do not CC the incorrect maker-devel-bounce address. >>> >>>Thanks, >>>Carson >>> >>> >>>On 3/13/14, 9:56 AM, "Carson Holt" >>>wrote: >>> >>>>FGENESH is not a heavily used tool, so depending on which version it is >>>>(either too old or too new), output might be slightly different which >>>>could cause incorrect parsing. Could you tar up your maker.output >>>>folder, >>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>(send me either your user/guest ID after you upload). >>>> >>>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>>is >>>>the old legacy version of NCBI BLAST. You can do this by setting the >>>>blast type in maker_bopts.ctl and the location of executables in >>>>maker_exe.ctl. >>>> >>>>Thanks, >>>>Carson >>>> >>>> >>>> >>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>>wrote: >>>> >>>>>Dear Maker users >>>>> >>>>> >>>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>>the >>>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>>inserted in the nucleotide sequence of some of the genes. This seems >>>>>to >>>>>be related to transcripts predicted by fgenesh_masked. >>>>> >>>>> >>>>>Here is an example for one of the genes >>>>> >>>>> >>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>>offset:0 AE >>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>>TTTCGACAAGC >>>>> >>>>>The same genome sequence was used for the first round of maker (2.10) >>>>>without such problem. I checked the sequence for the scaffold related >>>>>to >>>>>one of the affected transcripts and there was no error in the >>>>>sequence. >>>>>I am not sure what is causing this. The only error that I could spot >>>>>in >>>>>the output error file is the following >>>>> >>>>> >>>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>>contexts/frames of query sequences. >>>>> >>>>> >>>>> >>>>>Your help is appreciated >>>>> >>>>> >>>>> >>>>>HB >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> > From carsonhh at gmail.com Fri Mar 21 10:43:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 10:43:10 -0600 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts Message-ID: Thanks for letting me know. --Carson On 3/21/14, 10:41 AM, "Borhan, Hossein" wrote: >Dear Carson > >I ran maker and modified .pm files and it resolved the problem with the >fasta output. Thanks a lot for your help. > > > > >HB > > > > > > > > >On 14-03-17 1:45 PM, "Carson Holt" wrote: > >>I have attached 4 files for you to place in the .../maker/Widgets/ >>directory. >> >>The *blast.pm files will suppress the BLAST+ failures you are getting >>(alternatively you can just downgrade to BLAST 2.27 to get the same >>effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. >>In the latest release NCBI redid all their warnings and error codes so it >>spits out a lot of garbage and fails with different messages than it did >>before. For example BLAST now warns you every time it encounter a fasta >>header with a comment (virtually every fasta entry in existence falls in >>this category), so your screen will be awash with meaningless warning >>messages. >> >>The fgenesh.pm file will fix the other failure, which only occurs if you >>use fgenesh simultaneously with the est_fustion=1 option. No other >>predictors are affected. >> >>Thanks, >>Carson >> >> >>On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: >> >>>Dear Carson >>> >>>Sorry for the late reply. I was away for a couple of days. I have >>>uploaded >>>the out put files plus control and error output on the FTP site that you >>>provided >>>The user ID is borhanh >>> >>>I used blast+ for this run. >>> >>> >>> >>> >>>Regards >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-03-13 10:00 AM, "Carson Holt" >>>wrote: >>> >>>>Just resending this to the correct maker-devel address. Please when >>>>replying, do not CC the incorrect maker-devel-bounce address. >>>> >>>>Thanks, >>>>Carson >>>> >>>> >>>>On 3/13/14, 9:56 AM, "Carson Holt" >>>>wrote: >>>> >>>>>FGENESH is not a heavily used tool, so depending on which version it >>>>>is >>>>>(either too old or too new), output might be slightly different which >>>>>could cause incorrect parsing. Could you tar up your maker.output >>>>>folder, >>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>>(send me either your user/guest ID after you upload). >>>>> >>>>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>>>is >>>>>the old legacy version of NCBI BLAST. You can do this by setting the >>>>>blast type in maker_bopts.ctl and the location of executables in >>>>>maker_exe.ctl. >>>>> >>>>>Thanks, >>>>>Carson >>>>> >>>>> >>>>> >>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>>>wrote: >>>>> >>>>>>Dear Maker users >>>>>> >>>>>> >>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>>>the >>>>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>>>inserted in the nucleotide sequence of some of the genes. This seems >>>>>>to >>>>>>be related to transcripts predicted by fgenesh_masked. >>>>>> >>>>>> >>>>>>Here is an example for one of the genes >>>>>> >>>>>> >>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>>>offset:0 AE >>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>>>TTTCGACAAGC >>>>>> >>>>>>The same genome sequence was used for the first round of maker (2.10) >>>>>>without such problem. I checked the sequence for the scaffold related >>>>>>to >>>>>>one of the affected transcripts and there was no error in the >>>>>>sequence. >>>>>>I am not sure what is causing this. The only error that I could spot >>>>>>in >>>>>>the output error file is the following >>>>>> >>>>>> >>>>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>>>contexts/frames of query sequences. >>>>>> >>>>>> >>>>>> >>>>>>Your help is appreciated >>>>>> >>>>>> >>>>>> >>>>>>HB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Mon Mar 24 04:08:25 2014 From: marc.hoeppner at imbim.uu.se (=?iso-8859-1?Q?Marc_H=F6ppner?=) Date: Mon, 24 Mar 2014 10:08:25 +0000 Subject: [maker-devel] Annotations from proteins, follow-up Message-ID: <10AFC7D0-82BA-4527-9B77-80DC4BE80CFD@imbim.uu.se> Hi, I had previously inquired about protein-based gene building (for example to create a training set for SNAP). This is currently possible with Maker (2.31), but I noticed a limitation. Specifically, I tend to run Maker once to generate all the raw computes (protein and set alignments, mostly). I then separate these out into GFF files that I can store away and use in various combinations of settings and data in parallel. However, the protein2genome option does not seem to work off pre-aligned protein data (e.g. protein2genome.gff produced with Maker). Is that intentional and is there a work-around? Or is the only option to run this with fasta files? Cheers, Marc Marc P. Hoeppner, PhD Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From sujaikumar at gmail.com Mon Mar 24 08:15:16 2014 From: sujaikumar at gmail.com (Sujai) Date: Mon, 24 Mar 2014 14:15:16 +0000 Subject: [maker-devel] Dashes in transcript predictions Message-ID: Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nGt.0.3.035610.maker.output.tgz Type: application/x-gzip Size: 45641 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 24 10:49:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 10:49:46 -0600 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: I've actually never seen that before, but looking through your output it appears to be specifically caused by setting correct_est_fusion=1, and how it interacts with some features of your dataset. I've attached a patch in the form of a file you can use to replace .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER download. Thanks, Carson From: Sujai Date: Monday, March 24, 2014 at 8:15 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Dashes in transcript predictions Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN KKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: join.pm Type: text/x-perl-script Size: 18645 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 24 11:05:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 11:05:15 -0600 Subject: [maker-devel] Annotations from proteins, follow-up Message-ID: It not so much intentional as it is a a limitation of the information in GFF3 format alignments. Right now protein2genome for Eukaryotes will only try and make exonerate derived alignments work because they have been polished around splice sites and MAKER still has access to the original protein sequence and alignment cigar string fro additional filtering, etc. With GFF3 pass-through the algorithm doesn't know nearly as much about what is passed in. For example the protein sequence is gone, cigar alignment strings are rarely included (Gap= attribute in GFF3), and it's not always clear if the alignment was polished for splice sites. Also since protein2genome=1 is expected to be used only to generate an initial training set, and not for final annotations, this is considered a reasonable restriction. If you still really want to force protein alignments from a GFF3 to be considered as potential models, you could put them in as pred_gff. In which case they will always be considered as potential models. Of course it will be relatively ugly because you lack things I mentioned before such as the alignment cigar string and original protein sequence that are normally used to filter protein2genome results for inclusion as models. --Carson On 3/24/14, 4:08 AM, "Marc H?ppner" wrote: >Hi, > >I had previously inquired about protein-based gene building (for example >to create a training set for SNAP). This is currently possible with Maker >(2.31), but I noticed a limitation. Specifically, I tend to run Maker >once to generate all the raw computes (protein and set alignments, >mostly). I then separate these out into GFF files that I can store away >and use in various combinations of settings and data in parallel. > >However, the protein2genome option does not seem to work off pre-aligned >protein data (e.g. protein2genome.gff produced with Maker). Is that >intentional and is there a work-around? Or is the only option to run this >with fasta files? > >Cheers, > >Marc > > >Marc P. Hoeppner, PhD > >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at imbim.uu.se > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 24 12:15:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 12:15:39 -0600 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: One more note on this. The sequence is actually fully correct if you just remove the '-' characters. So if you don't want to rerun MAKER with the patch, then you can use the attached script to just repair the transcript file by removing the '-' characters. Your GFF3 files and proteins files should already be correct as is. Usage --> perl fix_dash transcript_file.fasta > new_file.fasta You may need to place the script in the .../maker/bin/ directory so it can detect BioPerl if you don't have BioPerl installed system wide. Thanks, Carson From: Carson Holt Date: Monday, March 24, 2014 at 10:49 AM To: Sujai , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Dashes in transcript predictions I've actually never seen that before, but looking through your output it appears to be specifically caused by setting correct_est_fusion=1, and how it interacts with some features of your dataset. I've attached a patch in the form of a file you can use to replace .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER download. Thanks, Carson From: Sujai Date: Monday, March 24, 2014 at 8:15 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Dashes in transcript predictions Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN KKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sujaikumar at gmail.com Mon Mar 24 12:17:02 2014 From: sujaikumar at gmail.com (Sujai) Date: Mon, 24 Mar 2014 18:17:02 +0000 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: Wow. That was a super quick response. Thanks very much for confirming the problem and the fixes! On 24 March 2014 18:15, Carson Holt wrote: > One more note on this. The sequence is actually fully correct if you just > remove the '-' characters. So if you don't want to rerun MAKER with the > patch, then you can use the attached script to just repair the transcript > file by removing the '-' characters. Your GFF3 files and proteins files > should already be correct as is. > > Usage --> perl fix_dash transcript_file.fasta > new_file.fasta > > You may need to place the script in the .../maker/bin/ directory so it can > detect BioPerl if you don't have BioPerl installed system wide. > > Thanks, > Carson > > From: Carson Holt > Date: Monday, March 24, 2014 at 10:49 AM > To: Sujai , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] Dashes in transcript predictions > > I've actually never seen that before, but looking through your output it > appears to be specifically caused by setting correct_est_fusion=1, and how > it interacts with some features of your dataset. > > I've attached a patch in the form of a file you can use to replace > .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER > download. > > Thanks, > Carson > > > From: Sujai > Date: Monday, March 24, 2014 at 8:15 AM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Dashes in transcript predictions > > Dear Maker Team > > On a recent run with maker 2.31, I noticed that a couple of the > transcripts had dashes/hyphens in them. > > Example: > >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript > offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 > TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG > AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT > GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG > GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT > AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG > ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT > AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT > TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA > > The protein prediction for this transcript is ok: > > >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 > eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 > > MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY > > DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV > > Is this a known bug? I tried searching for "dash|hyphen" in the email list > but couldn't find anything else. > > Best wishes, > > - Sujai > > ps. I pulled out just this one contig and ran maker on it. all the > .maker.output files are attached. > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana.garnica at anu.edu.au Mon Mar 24 17:11:01 2014 From: diana.garnica at anu.edu.au (Diana Garnica Moreno) Date: Mon, 24 Mar 2014 23:11:01 +0000 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Message-ID: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> Hi there, We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? Thanks! Diana -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 24 17:25:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 17:25:09 -0600 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Message-ID: You are probably getting the wrong proteins from your scripts because you are not taking into account the 5' and 3' UTR in the transcript. For example >snap_masked-contig-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|22|240 The 5' UTR is 261bp and the 3' UTR is 22bp long. Both would have to be trimmed before translating the transcript into a protein. Once they are trimmed you can use frame 0 for the translation. The fasta_tool that comes with MAKER can be used to quickly trim the UTR. Example: fasta_tool maker_transcripts.fasta --trim_maker_utr Then you can try your other scripts again. Thanks, Carson From: Diana Garnica Moreno Date: Monday, March 24, 2014 at 5:11 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Hi there, We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? Thanks! Diana _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Tue Mar 25 07:24:14 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Tue, 25 Mar 2014 09:24:14 -0400 Subject: [maker-devel] Maker iPlant image Message-ID: Greetings, I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From ernesto at ebi.ac.uk Tue Mar 25 04:10:59 2014 From: ernesto at ebi.ac.uk (ernesto lowy gallego) Date: Tue, 25 Mar 2014 10:10:59 +0000 Subject: [maker-devel] Incorrect translation start codon Message-ID: <53315633.2070702@ebi.ac.uk> Hi, I have been inspecting the MAKER predictions and I detected a situation which appears with a certain frequency. (See attached Apollo screenshot illustrating the situation I am going to describe): Let's say that there is est2genome evidence supporting the prediction of the 5' UTR region, I have realized that in some of these transcripts with 5'UTR, MAKER is not capable of identifying the right downstream ATG protein start codon and considers a TTG codon (coding for L) as the incorrect protein start. The proper ATG codon start is further downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly predict in this case (see the attached screenshot) Any comments on this? Thanks! ernesto -- Developer VectorBase | Ensembl Genomes -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2014-03-25 at 09.34.16.png Type: image/png Size: 32220 bytes Desc: not available URL: From carsonhh at gmail.com Tue Mar 25 08:19:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Mar 2014 08:19:22 -0600 Subject: [maker-devel] Incorrect translation start codon In-Reply-To: <53315633.2070702@ebi.ac.uk> References: <53315633.2070702@ebi.ac.uk> Message-ID: This is caused by BioPerl's is_start_codon method and default codon table returning true for non-canonical start codons. It was resolved some time ago (See previous discussion --> https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY ). Make sure you are using the most recent version of MAKER (currently 2.31). Thanks, Carson https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY On 3/25/14, 4:10 AM, "ernesto lowy gallego" wrote: >Hi, > >I have been inspecting the MAKER predictions and I detected a situation >which appears with a certain frequency. >(See attached Apollo screenshot illustrating the situation I am going to >describe): > >Let's say that there is est2genome evidence supporting the prediction of >the 5' UTR region, I have realized that in some of these transcripts >with 5'UTR, MAKER is not capable of identifying the right downstream ATG >protein start codon and considers a TTG codon (coding for L) as the >incorrect protein start. The proper ATG codon start is further >downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly >predict in this case (see the attached screenshot) > >Any comments on this? > >Thanks! > >ernesto > >-- >Developer > >VectorBase | Ensembl Genomes > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Mar 25 08:24:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Mar 2014 08:24:36 -0600 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: --> /opt/maker/bin/maker It looks like most preinstalled software is under /opt on the image. Thanks, Carson From: Daniel Standage Date: Tuesday, March 25, 2014 at 7:24 AM To: Maker Mailing List Subject: [maker-devel] Maker iPlant image Greetings, I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 25 10:33:59 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 25 Mar 2014 11:33:59 -0500 Subject: [maker-devel] maker to EvidenceModeler Message-ID: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> Hi Carson and others, Is there an easy tool/pipeline available as part of maker utilities to convert maker and SNAP output to files acceptable by EvidenceModeler? It looks like it also needs just gff files, but with a few tweaks. EvidenceModeler seems better equipped to handle PASA annotation results than maker results. Thanks Dhivya From barry.utah at gmail.com Tue Mar 25 11:51:38 2014 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 25 Mar 2014 11:51:38 -0600 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER In-Reply-To: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> References: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> Message-ID: Hi Diana, There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy. I just added a script to this library called gal_CDS_sequence which you would run like this: gal_CDS_sequence --translate genes.gff3 genome.fasta The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files. You can access the library here: http://www.sequenceontology.org/software/GAL.html Support for GAL is available via the SO mailing list: https://lists.sourceforge.net/lists/listinfo/song-devel Hope that helps, Barry On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote: > Hi there, > > We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? > > Thanks! > > Diana > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at plantbiology.msu.edu Wed Mar 26 08:21:36 2014 From: kchilds at plantbiology.msu.edu (Childs, Kevin) Date: Wed, 26 Mar 2014 14:21:36 +0000 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: Daniel, There are a few small issues with the MAKER-P_2.28 image at iPlant. I have been using the image successfully for more than a month. I typically set several environmental variables immediately after starting an ssh session. export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/ export ZOE=/opt/maker/exe/snap export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config export TMP=/tmp The image will allow you to train SNAP, but training Augustus is not possible with the current image. Augustus training requires blat which was not installed in this image. There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking. I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image. Augustus should work with a stock hmm on this image. I have not attempted to use GeneMark, and of course, fgenesh is a completely different story. Kevin Childs --- Kevin Childs, PhD Assistant Professor - Fixed Term Plant Biology Department Michigan State University kchilds at plantbiology.msu.edu 517-775-2844 (m) 517-353-5969 (l) On Mar 25, 2014, at 10:24 AM, Carson Holt wrote: > --> /opt/maker/bin/maker > > It looks like most preinstalled software is under /opt on the image. > > Thanks, > Carson > > > From: Daniel Standage > Date: Tuesday, March 25, 2014 at 7:24 AM > To: Maker Mailing List > Subject: [maker-devel] Maker iPlant image > > Greetings, > > I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From steinj at cshl.edu Wed Mar 26 12:41:37 2014 From: steinj at cshl.edu (Stein, Joshua) Date: Wed, 26 Mar 2014 18:41:37 +0000 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: Also please note that there is a tutorial available here, particularly important if you want to use in MPI mode. https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial Josh Joshua Stein, PhD Manager, Sci. Informatics III Cold Spring Harbor Laboratory steinj at cshl.edu http://ware.cshl.org/ On Mar 26, 2014, at 10:20 AM, "Childs, Kevin" wrote: > Daniel, > > There are a few small issues with the MAKER-P_2.28 image at iPlant. I have been using the image successfully for more than a month. I typically set several environmental variables immediately after starting an ssh session. > > export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/ > export ZOE=/opt/maker/exe/snap > export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config > export TMP=/tmp > > The image will allow you to train SNAP, but training Augustus is not possible with the current image. Augustus training requires blat which was not installed in this image. There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking. I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image. > > Augustus should work with a stock hmm on this image. > > I have not attempted to use GeneMark, and of course, fgenesh is a completely different story. > > Kevin Childs > > > --- > Kevin Childs, PhD > > Assistant Professor - Fixed Term > Plant Biology Department > Michigan State University > > kchilds at plantbiology.msu.edu > 517-775-2844 (m) > 517-353-5969 (l) > > On Mar 25, 2014, at 10:24 AM, Carson Holt wrote: > >> --> /opt/maker/bin/maker >> >> It looks like most preinstalled software is under /opt on the image. >> >> Thanks, >> Carson >> >> >> From: Daniel Standage >> Date: Tuesday, March 25, 2014 at 7:24 AM >> To: Maker Mailing List >> Subject: [maker-devel] Maker iPlant image >> >> Greetings, >> >> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Sat Mar 29 10:24:05 2014 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Sat, 29 Mar 2014 11:24:05 -0500 Subject: [maker-devel] Missing UTRs in GFF Message-ID: I have annotated a eukaryotic genome with MAKER 2.30. I recently realized that there are a few genes in the GFF file produced by gff3_merge with inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs have their own lines in the GFF file. However, for the problematic genes, the UTRs are not specified in the GFF file and all exons are annotated as CDS. The UTRs do appear in the gene header and the protein sequences are the correct length (do not include the UTR). I have attached an example from the GFF file. Is this a known problem, or have I done something wrong? Is there an easy way to fix the GFF file? Thanks for your help, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: missing_utr.gff Type: application/octet-stream Size: 2934 bytes Desc: not available URL: From mhinsley at ebi.ac.uk Mon Mar 31 04:20:10 2014 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Mon, 31 Mar 2014 11:20:10 +0100 Subject: [maker-devel] putative preponderance of short exons?? Message-ID: <5339415A.1020509@ebi.ac.uk> Hi I've run Maker on a de novo assembly of a species of fly and then ran some simple statistics (intron/ exon/ CDS length, exons per gene) over the GFF output and compared with a couple of other species. It all looks good except that there is a surprising number of very short exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' exons removed). I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP. I'm using maker 2.31 (unpatched). Anecdotally, these short exons appear without EST or protein evidence and they all line up with canonical splice sequences (GT----AG). (but i've only looked at a few using Apollo). While there's no requirement that exons should be longer I'm suspicious of this as there must be some evolutionary relationship between these species. I've compared with a another species annotated with Maker (using SNAP and Augustus) which is more distant (not yet publicly available), and the same pattern of short exons is present. I wondered if they were created to fulfil the need for start/stop codons, but this does not appear to be the case (mostly they are mid-gene). Is there some way to adjust the predictors eg to require external evidence? or anything else you could suggest? ... I can see the following in the tutorial but I'm not sure how they could help: pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no thanks -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom -------------- next part -------------- A non-text attachment was scrubbed... Name: exon_53.pdf Type: application/pdf Size: 10619 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 31 07:52:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 31 Mar 2014 07:52:15 -0600 Subject: [maker-devel] putative preponderance of short exons?? In-Reply-To: <5339415A.1020509@ebi.ac.uk> References: <5339415A.1020509@ebi.ac.uk> Message-ID: The intron/exon structure is determined by SNAP, Augustus, etc. It is not affected by any of the maker parameters. Only evidence alignments are affected by the maker settings. You can try retraining or manually editing the HMMs, but they might also be regions where your assembly is incorrect and those algorithms make short exons in order to make a structure work without getting stop codons mid gene. Thanks, Carson On 3/31/14, 4:20 AM, "Malcolm Hinsley" wrote: >Hi > >I've run Maker on a de novo assembly of a species of fly and then ran >some simple statistics (intron/ exon/ CDS length, exons per gene) over >the GFF output and compared with a couple of other species. >It all looks good except that there is a surprising number of very short >exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached >pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' >exons removed). > >I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP. >I'm using maker 2.31 (unpatched). > >Anecdotally, these short exons appear without EST or protein evidence >and they all line up with canonical splice sequences (GT----AG). >(but i've only looked at a few using Apollo). > >While there's no requirement that exons should be longer I'm suspicious >of this as there must be some evolutionary relationship between these >species. >I've compared with a another species annotated with Maker (using SNAP >and Augustus) which is more distant (not yet publicly available), and >the same pattern of short exons is present. >I wondered if they were created to fulfil the need for start/stop >codons, but this does not appear to be the case (mostly they are >mid-gene). > > >Is there some way to adjust the predictors eg to require external >evidence? or anything else you could suggest? ... I can see the >following in the tutorial but I'm not sure how they could help: > >pred_flank=200 #flank for extending evidence clusters sent to gene >predictors >pred_stats=0 #report AED and QI statistics for all predictions as well as >models >AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and >1) >min_protein=0 #require at least this many amino acids in predicted >proteins >alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = >yes, 0 = no >always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 >= no > > >thanks > >-- >malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 >European Bioinformatics Institute (EMBL-EBI) >European Molecular Biology Laboratory >Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD >United Kingdom > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 31 08:37:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 31 Mar 2014 08:37:15 -0600 Subject: [maker-devel] Missing UTRs in GFF In-Reply-To: References: Message-ID: Not something I've seen before, but there was a patch for another issue that was cause by the use of avoid_est_fusion=1, that may be related. Try the current stable release 2.31, and let me know if it still happens. You can also upload the contig folder from one of the regions in question here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Then I could verify the bug, and see if it is something that happens in the current release. --Carson From: Benjamin Rubin Date: Saturday, March 29, 2014 at 10:24 AM To: Subject: [maker-devel] Missing UTRs in GFF I have annotated a eukaryotic genome with MAKER 2.30. I recently realized that there are a few genes in the GFF file produced by gff3_merge with inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs have their own lines in the GFF file. However, for the problematic genes, the UTRs are not specified in the GFF file and all exons are annotated as CDS. The UTRs do appear in the gene header and the protein sequences are the correct length (do not include the UTR). I have attached an example from the GFF file. Is this a known problem, or have I done something wrong? Is there an easy way to fix the GFF file? Thanks for your help, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pushplata.singh at teri.res.in Sun Mar 2 22:29:37 2014 From: pushplata.singh at teri.res.in (Pushplata Singh) Date: Mon, 3 Mar 2014 10:59:37 +0530 Subject: [maker-devel] Query on Hardware requirement Message-ID: Hi, I am trying to assemble and analyse(bio-informatics) genome sequence of a 35 GB fungal genome. The raw data that has been generated from Illumina sequencing is of ~15 GB. Could you please suggest me the system (hardware) requirement for installing and running Maker and ALLPATHS-LG sofrware for the job? Thank you Pushplata Singh, PhD Nanobiotechnology Centre Biotechnology and Management of Bioresources Division The Energy and Resources Institute Darbari Seth Block , India Habitat Centre,Lodhi Road New Delhi 110003 India Phone +91 11 24682100 ext 2611 Fax +91 11 24682145 ------------------------------------------------------------------------------------------------------------ Disclaimer: The information contained in this e-mail is intended for the person or entity to which it is addressed, and it may contain confidential and/or privileged material. Any review or other use of this mail or taking any action based on it by persons or entities other than the intended recipient is strictly prohibited. If you receive this e-mail by mistake, please contact the sender, and delete all copies of this mail.This e-mail has been scanned and verified by McAfee SaaS Email Security, formerly MX Logic. From dence at genetics.utah.edu Mon Mar 3 07:11:34 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 3 Mar 2014 14:11:34 +0000 Subject: [maker-devel] Query on Hardware requirement In-Reply-To: References: Message-ID: Hi Pradeep, I think Allpaths is developed by the Broad Institute, so you'd have to check their documentation for their system requirments. MAKER is installable on Linux and Mac OS X computers. The throughput you'll be able to achieve with MAKER depends on how many processors and how much RAM the machine has. To take advantage of MAKER's ability to parallelize the annotation process, you need some version of MPI installed on your machine. MAKER can try to install MPI for you, but a manual installation is usually required. I hope that helps. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Pushplata Singh [pushplata.singh at teri.res.in] Sent: Sunday, March 02, 2014 10:29 PM To: maker-devel at yandell-lab.org Cc: Pradeep Dahiya Subject: [maker-devel] Query on Hardware requirement Hi, I am trying to assemble and analyse(bio-informatics) genome sequence of a 35 GB fungal genome. The raw data that has been generated from Illumina sequencing is of ~15 GB. Could you please suggest me the system (hardware) requirement for installing and running Maker and ALLPATHS-LG sofrware for the job? Thank you Pushplata Singh, PhD Nanobiotechnology Centre Biotechnology and Management of Bioresources Division The Energy and Resources Institute Darbari Seth Block , India Habitat Centre,Lodhi Road New Delhi 110003 India Phone +91 11 24682100 ext 2611 Fax +91 11 24682145 ------------------------------------------------------------------------------------------------------------ Disclaimer: The information contained in this e-mail is intended for the person or entity to which it is addressed, and it may contain confidential and/or privileged material. Any review or other use of this mail or taking any action based on it by persons or entities other than the intended recipient is strictly prohibited. If you receive this e-mail by mistake, please contact the sender, and delete all copies of this mail.This e-mail has been scanned and verified by McAfee SaaS Email Security, formerly MX Logic. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Mon Mar 3 12:08:49 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 3 Mar 2014 19:08:49 +0000 Subject: [maker-devel] FW: error runinig agustus In-Reply-To: References: Message-ID: Forwarding this to the maker-devel list. On 3/3/14, 12:04 PM, "Borhan, Hossein" wrote: >I encountered the following error while running maker (2nd annotation >using gff file of the first maker run and trinity assembled RNA seq as >EST) > >ERROR: Augustus failed >--> rank=NA, hostname=rapa.agr.gc.ca > >Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I >am using 2.31 > >Your help is appreciated > > >HB > > > > > From carsonhh at gmail.com Mon Mar 3 12:11:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 03 Mar 2014 12:11:08 -0700 Subject: [maker-devel] FW: error runinig agustus Message-ID: You will need to provide more detail. Probably the entire error log and the maker control files. Thanks, Carson On 3/3/14, 12:08 PM, "Carson Holt" wrote: >Forwarding this to the maker-devel list. > > >On 3/3/14, 12:04 PM, "Borhan, Hossein" wrote: > >>I encountered the following error while running maker (2nd annotation >>using gff file of the first maker run and trinity assembled RNA seq as >>EST) >> >>ERROR: Augustus failed >>--> rank=NA, hostname=rapa.agr.gc.ca >> >>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I >>am using 2.31 >> >>Your help is appreciated >> >> >>HB >> >> >> >> >> > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sjackman at gmail.com Tue Mar 4 19:10:42 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 4 Mar 2014 18:10:42 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and *three_prime_UTR*features. Ideally the feature type would be set to *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: > Set single_exon=1, and the minimum size to a smaller value. I think it's > set to 250 right now. Also est2genome is looking for ORF, so if there is > none (as with tRNAs) they probably won't get picked up. > > --Carson > > Sent from my iPhone > > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > > Sorry, ignore my previous question. est_forward also carries forward the > names of protein evidence and works like a charm. Thank you! > > The larger rrn16 and rrn23 genes annotated perfectly, but the smaller > rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They > are in the blastn output, and in the evidence_0.gff. rrn5 has perfect > identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value > (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing > these hits? > > organism_type=prokaryotic > est2genome=1 > protein2genome=1 > est_forward=1 > > Cheers, > Shaun > > > On 27 February 2014 15:17, Shaun Jackman wrote: > >> Is there a corresponding protein_forward=1 option to map forward protein >> names from protein2genome? >> >> Cheers, >> Shaun >> >> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >> wrote: >> >> Sorry I meant to say prefilter on the score in the mRNA column before >> passing the gff3 to model_gff. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >> >> What you can do is run it once with just est_forward=1 and >> est2genome/protein2genome set to 1. Then take those results, pass them in >> as model_gff and use the map_forward option to then filter the results >> based on mRNA score and that would copy names onto new gene under the >> standard MAKER pipeline. Eventually it?s really supposed to go into a >> separate tool that will map genes onto new assemblies (but under the hood >> the tool will just be calling MAKER with certain parameters restricted). I >> do this because if people commonly use it mixed with things like SNAP I can >> start to get some very weird behaviors. >> >> Thanks, >> Carson >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 3:04 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> It seems that this could be a very useful option in those cases where >> you have firm a priori knowledge of the placement of ESTs. However, while >> trying it I note that est_forward implies that the est2genome predictor is >> turned on, implicitly. Is this necessary for this to work? I?m after the >> behavior you describe below where exonerate is made to try really hard >> within a limited region to align an est, but I would not like maker to >> produce est2genome predictions. >> >> In general, I think this maker_coor and est_forward is a feature set that >> is worthy to be promoted into a documented feature. >> >> THanks, >> Mikael >> >> 26 feb 2014 kl. 17:09 skrev Carson Holt : >> >> It will still work without est_forward. It just works a little >> differently. Keep in mind this was a hidden feature I used to find >> stubborn or hard to find missing genes after reassembly of a genome. >> >> If est_forward is provided, MAKER will parse the database to look for the >> maker_coor tags early in the pipeline. Then it will create a list of >> locations to search, and it will search them even if there are no BLAST >> results to seed the search (normally MAKER gets a BLAST result first and >> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >> look for a match using all of chr1 as the input to exonerate even when >> BLAST finds nothing (this is a very very slow search, but can help pick up >> one or two stubborn genes that don?t remap well). To allow this, MAKER >> gives exonerate looser matching parameters (i.e. allows for single base >> pair introns perhaps caused by assembly errors). The logic here is that >> given the fact that I already told MAKER that with some degree of >> confidence I expect sequence A to map to to location X, it will try its >> hardest to make it match. >> >> Without est_forward set, the maker_coor= flag still gets read in GI.pm at >> line 1563, but only after a BLAST alignment has already seeded it to the >> region (that BLAST result has the information in its description >> parameter). MAKER will then ignore seeds completely outside of maker_coor. >> In addition any BLAST seeds that overlap maker_coor will get the search >> space for alignment polishing adjusted to match maker_coor exactly. Also >> match parameters for exonerate will not be relaxed as they were with >> est_forward. >> >> As you can see the behavior, is slightly different (because it?s an >> accidental feature). >> >> Thanks, >> Carson >> >> >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 6:37 AM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> That might be a useful and time saving accidental feature. But, reading >> the code, it seems that I need to supply maker_coor but not gene_id, as >> well as the configuration option est_forward for this to work. Any >> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >> right? >> >> Mikael >> >> 26 feb 2014 kl. 14:22 skrev Carson Holt : >> >> Yes. That should work as well as an accidental feature. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >> mikael.durling at slu.se> wrote: >> >> Can this use of maker_coor be used only to hint about the placement of >> the ests, without affecting the naming of the final genes? Ie if I have a >> database of EST where I have a priori knowledge of their rough placement, >> can this placement be given to maker without providing est_forward=1? >> >> Thanks, >> Mikael >> >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >> There is a way. It?s not a standard option and it?s undocumented, but >> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >> that. The option won?t already be there so you?ll have to type it in. >> >> There is also a feature designed to work with this option. If you add >> tags to your fasta headers, those can be used to guide the mapping and >> naming. For example, gene_id= will ensure different isoforms >> that share a common gene_id get clustered into the same gene, >> and maker_coor=chr1:1-10000 in the fasta header will force a particular >> sequence to only be mapped against chr1 within the range of 1-10000 bp and >> just using maker_coor=chr1 will force it to only be mapped against chr1. >> >> This is an undocumented way to remap genes onto new assemblies using >> blast alignments of earlier transcript or protein annotations as a guide. >> >> ?Carson >> >> >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, February 25, 2014 at 5:06 PM >> To: >> Subject: [maker-devel] Mapping gene names >> >> Hi, >> >> I?m annotating a genome using a closely related genome from Genbank, >> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >> annotate my genome. I?ve run Maker, and the annotation seems to have worked >> well. Is it possible to map the names of the genes from the related species >> to my annotation? I see the *map_forward* option, which applies to the >> *model_gff* parameter. Is there a similar option for *est* and *protein*? >> >> *maker_opts.ctl* >> >> est=NC_123456.frn >> protein=NC_123456.faa >> est2genome=1 >> protein2genome=1 >> >> Thanks, >> Shaun >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 4 19:33:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 04 Mar 2014 19:33:12 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. It focuses only on the coding genes. You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: > Set single_exon=1, and the minimum size to a smaller value. I think it's set > to 250 right now. Also est2genome is looking for ORF, so if there is none (as > with tRNAs) they probably won't get picked up. > > --Carson > > Sent from my iPhone > > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > >> Sorry, ignore my previous question. est_forward also carries forward the >> names of protein evidence and works like a charm. Thank you! >> >> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 >> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the >> blastn output, and in the evidence_0.gff. rrn5 has perfect identity, >> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < >> eval_blastn=1e-10). How should I debug which filter is removing these hits? >> organism_type=prokaryotic >> est2genome=1 >> protein2genome=1 >> est_forward=1 >> Cheers, >> Shaun >> >> >> >> On 27 February 2014 15:17, Shaun Jackman wrote: >>> Is there a corresponding protein_forward=1 option to map forward protein >>> names from protein2genome? >>> >>> >>> Cheers, >>> Shaun >>> >>> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>> ) wrote: >>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>>> What you can do is run it once with just est_forward=1 and >>>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>>> as model_gff and use the map_forward option to then filter the results >>>>> based on mRNA score and that would copy names onto new gene under the >>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>> the tool will just be calling MAKER with certain parameters restricted). >>>>> I do this because if people commonly use it mixed with things like SNAP I >>>>> can start to get some very weird behaviors. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> It seems that this could be a very useful option in those cases where you >>>>> have firm a priori knowledge of the placement of ESTs. However, while >>>>> trying it I note that est_forward implies that the est2genome predictor is >>>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>>> behavior you describe below where exonerate is made to try really hard >>>>> within a limited region to align an est, but I would not like maker to >>>>> produce est2genome predictions. >>>>> >>>>> In general, I think this maker_coor and est_forward is a feature set that >>>>> is worthy to be promoted into a documented feature. >>>>> >>>>> THanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>>> It will still work without est_forward. It just works a little >>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>> >>>>>> If est_forward is provided, MAKER will parse the database to look for the >>>>>> maker_coor tags early in the pipeline. Then it will create a list of >>>>>> locations to search, and it will search them even if there are no BLAST >>>>>> results to seed the search (normally MAKER gets a BLAST result first and >>>>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>>>> look for a match using all of chr1 as the input to exonerate even when >>>>>> BLAST finds nothing (this is a very very slow search, but can help pick >>>>>> up one or two stubborn genes that don?t remap well). To allow this, >>>>>> MAKER gives exonerate looser matching parameters (i.e. allows for single >>>>>> base pair introns perhaps caused by assembly errors). The logic here is >>>>>> that given the fact that I already told MAKER that with some degree of >>>>>> confidence I expect sequence A to map to to location X, it will try its >>>>>> hardest to make it match. >>>>>> >>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at >>>>>> line 1563, but only after a BLAST alignment has already seeded it to the >>>>>> region (that BLAST result has the information in its description >>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>> exactly. Also match parameters for exonerate will not be relaxed as they >>>>>> were with est_forward. >>>>>> >>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>> accidental feature). >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> >>>>>> From: Mikael Brandstr?m Durling >>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>> To: Carson Holt >>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>> >>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>> well as the configuration option est_forward for this to work. Any >>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>> set_forward=1 right? >>>>>> >>>>>> Mikael >>>>>> >>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> > >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.bemm at uni-wuerzburg.de Wed Mar 5 09:35:33 2014 From: felix.bemm at uni-wuerzburg.de (Felix Bemm) Date: Wed, 05 Mar 2014 17:35:33 +0100 Subject: [maker-devel] Build Issues - v2.31 Message-ID: <53175255.4050102@uni-wuerzburg.de> Hi, I am trying to build maker version 2.31. Got the following error: Configuring MAKER with MPI support 'CCFLAGSEX' is not a valid config option for Inline::C at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm line 236 at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm line 256 Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', '/software/mpich2-1.5rc3/include', 'blib', '') called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277 MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2024 Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007 Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469 MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2024 Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012 Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at ./Build line 70 Same procedure worked with 2.29-beta! Any ideas? Felix -- Felix Bemm Department of Bioinformatics University of W?rzburg, Germany Tel: +49 931 - 31 83696 Fax: +49 931 - 31 84552 felix.bemm at uni-wuerzburg.de From carsonhh at gmail.com Wed Mar 5 09:40:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Mar 2014 09:40:05 -0700 Subject: [maker-devel] Build Issues - v2.31 In-Reply-To: <53175255.4050102@uni-wuerzburg.de> References: <53175255.4050102@uni-wuerzburg.de> Message-ID: You need to update your Inline::C module. The CCFLAGSEX option was added to Inline::C a couple of years ago to allow users to pass in flags to the compiler. Thanks, Carson On 3/5/14, 9:35 AM, "Felix Bemm" wrote: >Hi, > >I am trying to build maker version 2.31. Got the following error: > >Configuring MAKER with MPI support >'CCFLAGSEX' is not a valid config option for Inline::C > at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm >line 236 > at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm >line 256 > Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', >'/software/mpich2-1.5rc3/include', 'blib', '') called at >/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277 > MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at >/usr/share/perl/5.14/Module/Build/Base.pm line 2024 > Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', >'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007 > Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') >called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469 > MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at >/usr/share/perl/5.14/Module/Build/Base.pm line 2024 > Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', >'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012 > Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at >./Build line 70 > >Same procedure worked with 2.29-beta! > >Any ideas? > >Felix > >-- >Felix Bemm >Department of Bioinformatics >University of W?rzburg, Germany >Tel: +49 931 - 31 83696 >Fax: +49 931 - 31 84552 >felix.bemm at uni-wuerzburg.de > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Wed Mar 5 12:02:26 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 5 Mar 2014 19:02:26 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: Message-ID: On 3/5/14, 11:59 AM, "Borhan, Hossein" wrote: >Dear Maker users > >I want to run maker on a fungal genome of about 45 Mb with about 1/3 of >the genome begin repeat rich. But most of the virulent genes are located >within the repeat regions flanked but stretch of repeats. I am not sure >if I use the repeat masker option I am going to miss out on the >predication of these virulent genes located within the repeats. > >Other concerns with the setting in maker-opts file for fungal genomes are: > >single_exon = 0 should this get changed to 1 since single exon genes >are quit common in fungi and what is the consequence of this on using EST >and assembled RNA as evidence for gene prediction > >correct_est_fusion=0 #limits use of ESTs in annotation >to avoid fusion genes as I understand this option will remove the >overlapping UTRs but what is the consequence of setting this option on >the use of EST for predicting ORFs > > >Thanks > > > >HB > > > > From carsonhh at gmail.com Wed Mar 5 12:17:57 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Mar 2014 12:17:57 -0700 Subject: [maker-devel] FW: maker-control file Message-ID: Not using repeat masking will cause many problems. Beside a gene being flanked by repeats does not mean it will be lost, any evidence/alignments that can seed in non-repetative regions (gene/exon) are still allowed to extend into repetitive regions during the polishing stage (aligners have two stages - seed and extend). So transposons should never seed, but genes will because there sequence will contain non-repetative regions (even if they are near repeats). single_exon should be set to 1 for fungi, just make sure to set the minimum length of single exon evidence to something reasonable like 250bp. correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). Thanks, Carson > > >On 3/5/14, 11:59 AM, "Borhan, Hossein" <> wrote: > >>Dear Maker users >> >>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of >>the genome begin repeat rich. But most of the virulent genes are located >>within the repeat regions flanked but stretch of repeats. I am not sure >>if I use the repeat masker option I am going to miss out on the >>predication of these virulent genes located within the repeats. >> >>Other concerns with the setting in maker-opts file for fungal genomes >>are: >> >>single_exon = 0 should this get changed to 1 since single exon genes >>are quit common in fungi and what is the consequence of this on using EST >>and assembled RNA as evidence for gene prediction >> >>correct_est_fusion=0 #limits use of ESTs in annotation >>to avoid fusion genes as I understand this option will remove the >>overlapping UTRs but what is the consequence of setting this option on >>the use of EST for predicting ORFs >> >> >>Thanks >> >> >> >>HB >> >> >> >> > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Thu Mar 6 00:26:29 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Thu, 6 Mar 2014 07:26:29 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: Message-ID: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Hi, I think this is an interesting comment that I would like a few more information on: correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 07:29:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 07:29:35 -0700 Subject: [maker-devel] FW: maker-control file In-Reply-To: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Message-ID: > Wouldn?t it be more sensible to rely on the evidence over probabilistic > models? Yes. Infact that is the backbone of MAKER. The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic). Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER. With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot). > The annotation would be partial, but on the other hand the chance of > incorporating false signals are smaller (assuming I can generate a clean set > of transcripts from RNA-seq data)? False signals are abundant. It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly. By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated. > As an example, using SNAP and Augustus on a bird genome - with augustus > achieving nucleotide and exon sensitivities in the 70-90% range gave a host if > false exons that were simply not supported by the RNAseq data, yet made it > into the final gene build. You will get false positives from est2genome alone approach as well. Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate). Also look at the MAKER2 paper Figure 1. The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors). > Is it possible to get some more details on how Maker uses ab-inito predictions > and reconciles them with evidence alignments? At the moment it seems to me > that maker gives higher weight to the ab-initio predictions, which to me seems > problematic. Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen based off of evidence overlap using AED (completely evidence based). It is the model generation that leverages the hint based feedback. The names of MAKER genes can let you know what the source of the model is. Any time hint based models match the evidence better the name will have hame like this ?> maker---gene- (I.e. maker-chr1-snap-gene-0.4) When the ab initio model matches better than the hint based model the name is like this ?> --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models. Thanks, Carson From: Marc H?ppner Date: Thursday, March 6, 2014 at 12:26 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] FW: maker-control file Hi, I think this is an interesting comment that I would like a few more information on: > > correct_est_fusion should not be used together with est2genome. It won?t > fail, you just get odd results. Actually est2genome should not ever be > used to generate the final annotation set. It is a convenience method > that allows you to generate rough models for training gene predictors like > SNAP and Augustus. But once they are trained it should be turned off, > because the models it produces will be partial (Ests rarely cover the > whole transcript) and the results will have many false potties from > background transcription events from your EST data. These models are good > enough to train with, but make very poor final annotations. So in the end > you should be using correct_est_fusion=1 with the SNAP pr Augustus set and > not est2genome (which should already have been turned off by then). > My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Thu Mar 6 07:40:48 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Thu, 6 Mar 2014 14:40:48 +0000 Subject: [maker-devel] FW: maker-control file In-Reply-To: References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> Message-ID: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> Hi Carson, Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho). But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc. Regards, Marc On 06 Mar 2014, at 15:29, Carson Holt > wrote: Wouldn?t it be more sensible to rely on the evidence over probabilistic models? Yes. Infact that is the backbone of MAKER. The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic). Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER. With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot). The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? False signals are abundant. It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly. By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. You will get false positives from est2genome alone approach as well. Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate). Also look at the MAKER2 paper Figure 1. The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors). Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen based off of evidence overlap using AED (completely evidence based). It is the model generation that leverages the hint based feedback. The names of MAKER genes can let you know what the source of the model is. Any time hint based models match the evidence better the name will have hame like this ?> maker---gene- (I.e. maker-chr1-snap-gene-0.4) When the ab initio model matches better than the hint based model the name is like this ?> --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models. Thanks, Carson From: Marc H?ppner > Date: Thursday, March 6, 2014 at 12:26 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] FW: maker-control file Hi, I think this is an interesting comment that I would like a few more information on: correct_est_fusion should not be used together with est2genome. It won?t fail, you just get odd results. Actually est2genome should not ever be used to generate the final annotation set. It is a convenience method that allows you to generate rough models for training gene predictors like SNAP and Augustus. But once they are trained it should be turned off, because the models it produces will be partial (Ests rarely cover the whole transcript) and the results will have many false potties from background transcription events from your EST data. These models are good enough to train with, but make very poor final annotations. So in the end you should be using correct_est_fusion=1 with the SNAP pr Augustus set and not est2genome (which should already have been turned off by then). My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls. As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic. /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 08:03:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 08:03:10 -0700 Subject: [maker-devel] FW: maker-control file In-Reply-To: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> References: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se> <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se> Message-ID: MAKER wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page Thanks, Carson From: Marc H?ppner Date: Thursday, March 6, 2014 at 7:40 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] FW: maker-control file Hi Carson, Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho). But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc. Regards, Marc On 06 Mar 2014, at 15:29, Carson Holt wrote: >> Wouldn?t it be more sensible to rely on the evidence over probabilistic >> models? > > Yes. Infact that is the backbone of MAKER. The evidence is used to derive > hints that are passed back into the predictors and reviewed in light of the > evidence to decide on final models (no longer strictly probabalistic). Take a > look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when > you use the wrong species parameters in the predictor (I.e. A. thaliana to > annotate C. elegant) you get as much as a 3 fold increase in exon level > accuracy by using the hint feedback from MAKER. With est2genome option you > don?t get that hint feedback (normally probabilistic models, EST evidence, and > protein evidence would all work together), and the models are overall poorer > and contain more false positives (we have looked at this a lot). > > >> The annotation would be partial, but on the other hand the chance of >> incorporating false signals are smaller (assuming I can generate a clean set >> of transcripts from RNA-seq data)? > > False signals are abundant. It?s just the nature of how ESTs and especially > mRNAseq reads are generated and anchored back to the assembly. By letting > there be feedback between the probabilistic model and the evidence (both > protein and EST/mRNAseq) a lot of this is eliminated. > > >> As an example, using SNAP and Augustus on a bird genome - with augustus >> achieving nucleotide and exon sensitivities in the 70-90% range gave a host >> if false exons that were simply not supported by the RNAseq data, yet made it >> into the final gene build. > > You will get false positives from est2genome alone approach as well. Models > will be more partial, and false negative rate will be very high (often 30-70% > false negative rate). Also look at the MAKER2 paper Figure 1. The false > positive rate from ab initio alone can be quite high, but with the evidence > feedback it is substantially reduced (especially for poorly trained > predictors). > > >> Is it possible to get some more details on how Maker uses ab-inito >> predictions and reconciles them with evidence alignments? At the moment it >> seems to me that maker gives higher weight to the ab-initio predictions, >> which to me seems problematic. > > Take a look at the MAKER, MAKER2, and MAKER-P papers. Final genes are chosen > based off of evidence overlap using AED (completely evidence based). It is > the model generation that leverages the hint based feedback. The names of > MAKER genes can let you know what the source of the model is. Any time hint > based models match the evidence better the name will have hame like this ?> > maker---gene- (I.e. maker-chr1-snap-gene-0.4) > > When the ab initio model matches better than the hint based model the name is > like this ?> > --abinit-gene- (I.e. snap-chr1-abinit-gene-0.2) > > > In summary, using est2genome alone (while good for generating training sets) > undercuts the power of the evidence feedback together with the probabilistic > models. > > > Thanks, > Carson > > From: Marc H?ppner > Date: Thursday, March 6, 2014 at 12:26 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] FW: maker-control file > > Hi, > > I think this is an interesting comment that I would like a few more > information on: > >> >> correct_est_fusion should not be used together with est2genome. It won?t >> fail, you just get odd results. Actually est2genome should not ever be >> used to generate the final annotation set. It is a convenience method >> that allows you to generate rough models for training gene predictors like >> SNAP and Augustus. But once they are trained it should be turned off, >> because the models it produces will be partial (Ests rarely cover the >> whole transcript) and the results will have many false potties from >> background transcription events from your EST data. These models are good >> enough to train with, but make very poor final annotations. So in the end >> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and >> not est2genome (which should already have been turned off by then). >> > > My experience has been that the process of training gene finders, especially > for complex genomes like vertebrates, is a very slow and painful process. And > ultimately, the results are far from accurate, even with a sizeable, manually > curated training set. Wouldn?t it be more sensible to rely on the evidence > over probabilistic models? The annotation would be partial, but on the other > hand the chance of incorporating false signals are smaller (assuming I can > generate a clean set of transcripts from RNA-seq data)? And I?d rather > underestimate the exon inventory slightly than putting out an annotation with > ~ 10% false exon calls. > > As an example, using SNAP and Augustus on a bird genome - with augustus > achieving nucleotide and exon sensitivities in the 70-90% range gave a host if > false exons that were simply not supported by the RNAseq data, yet made it > into the final gene build. Not sure what to think about that to be honest. Is > it possible to get some more details on how Maker uses ab-inito predictions > and reconciles them with evidence alignments? At the moment it seems to me > that maker gives higher weight to the ab-initio predictions, which to me seems > problematic. > > > /Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Mar 6 13:56:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Mar 2014 12:56:34 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson? Sent from my iPhone On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 13:58:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 13:58:41 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Yes. I?ll fix the naming. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 1:56 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: > Trying to call non-coding RNA from ESTs or even sequence homology is extremely > messy (non-trivial problem in most organisms with high false positive rate), > so MAKER for the most part doesn?t even try to do that. It focuses only on > the coding genes. You can now use tRNAscan and snoscan in the newest version > for some non-coding RNA support (those features were only added a couple of > months ago). So just like other prediction tools (snap, augustus etc.), the > primary focus has always been the coding genes. We?ve only started adding > non-coding RNA support recently for iPlant, so it?s still relatively immature. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, March 4, 2014 at 7:10 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the > tip. > > The rRNA genes that are found with est2genome have the feature type set to > mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. > Ideally the feature type would be set to rRNA or tRNA as appropriate, and > would omit the UTR and CDS features. Is that a feature that you would be > interested in adding to MAKER? The rRNA gene names all start with ?rrn? and > the tRNA gene names with ?trn?, as is standard, so determining the appropriate > type should be straight forward. > > Thanks again for your help with this. Cheers, > Shaun > > > > On 27 February 2014 17:13, Carson Holt wrote: >> Set single_exon=1, and the minimum size to a smaller value. I think it's set >> to 250 right now. Also est2genome is looking for ORF, so if there is none >> (as with tRNAs) they probably won't get picked up. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 >>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in >>> the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, >>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < >>> eval_blastn=1e-10). How should I debug which filter is removing these hits? >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> Cheers, >>> Shaun >>> >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>> Is there a corresponding protein_forward=1 option to map forward protein >>>> names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>> ) wrote: >>>>> >>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>> passing the gff3 to model_gff. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>> >>>>>> What you can do is run it once with just est_forward=1 and >>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>> in as model_gff and use the map_forward option to then filter the results >>>>>> based on mRNA score and that would copy names onto new gene under the >>>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>>> the tool will just be calling MAKER with certain parameters restricted). >>>>>> I do this because if people commonly use it mixed with things like SNAP I >>>>>> can start to get some very weird behaviors. >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> From: Mikael Brandstr?m Durling >>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>> To: Carson Holt >>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>> >>>>>> It seems that this could be a very useful option in those cases where you >>>>>> have firm a priori knowledge of the placement of ESTs. However, while >>>>>> trying it I note that est_forward implies that the est2genome predictor >>>>>> is turned on, implicitly. Is this necessary for this to work? I?m after >>>>>> the behavior you describe below where exonerate is made to try really >>>>>> hard within a limited region to align an est, but I would not like maker >>>>>> to produce est2genome predictions. >>>>>> >>>>>> In general, I think this maker_coor and est_forward is a feature set that >>>>>> is worthy to be promoted into a documented feature. >>>>>> >>>>>> THanks, >>>>>> Mikael >>>>>> >>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Mar 6 16:00:40 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 6 Mar 2014 23:00:40 +0000 Subject: [maker-devel] maker problem with running blast In-Reply-To: References: Message-ID: Your blast_type parameter in maker_bopts.ctl is set to 'wublast' but the executables for wublast are blank in maker_exe.ctl. See, they?re blank ?> xdformat=#location of WUBLAST xdformat executable blasta=#location of WUBLAST blasta executable You either need to provide executables or set your blast_type parameter to something else. For example, you could set it to 'NCBI+', but you will nee to fix the location of makeblastdb. makeblastdb is set incorrectly here?> makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of NCBI+ makeblastdb executable Alternativley you can set blast_type to 'NCBI', but you will need to uncomment the executables. Here?> formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable blastall=#/usr/local/bin/blastall #location of NCBI blastall executable ?Carson On 3/6/14, 3:51 PM, "Borhan, Hossein" wrote: >Hi > >I have installed latest version of blast+ and provided the excitable path >to the maker_exec.ctl as follow > >#-----Location of Executables Used by MAKER/EVALUATOR >makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of >NCBI+ makeblastdb executable >blastn=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastn #location >of NCBI+ blastn executable >blastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastx #location >of NCBI+ blastx executable >tblastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/tblastx >#location of NCBI+ tblastx executable >formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable >blastall=#/usr/local/bin/blastall #location of NCBI blastall executable >xdformat=#location of WUBLAST xdformat executable >blasta=#location of WUBLAST blasta executable >RepeatMasker=/usr/local/RepeatMasker/RepeatMasker #location of >RepeatMasker executable >exonerate=/home/AAFC-AAC/borhanh/bin/exonerate-2.2.0-x86_64/bin/exonerate >#location of exonerate executable > >#-----Ab-initio Gene Prediction Algorithms >snap=/home/AAFC-AAC/borhanh/bin/snap/snap #location of snap executable >gmhmme3=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/gmhmme3 >#location of eukaryotic genemark executable >gmhmmp= #location of prokaryotic genemark executable >augustus=/usr/local/augustus.2.5.5/bin/augustus #location of augustus >executable >fgenesh=/usr/local/FGENESH/fgenesh #location of fgenesh executable > >#-----Other Algorithms >fathom=/home/AAFC-AAC/borhanh/bin/snap/fathom #location of fathom >executable (experimental) >probuild=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/probuild >#location of probuild executable (required for genemark) > > > > > >But when running maker I get this error > > >STATUS: Parsing control files... >WARNING: blast_type is set to 'wublast' but executables cannot be located >ERROR: Please provide a valid locaction for a BLAST algorithm in the >control files. > > > > > > > From sjackman at gmail.com Thu Mar 6 16:33:04 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Mar 2014 15:33:04 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models? Cheers, Shaun On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote: Yes. ?I?ll fix the naming. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 1:56 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding. The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention? Cheers, Shaun On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, March 4, 2014 at 7:10 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. Thanks again for your help with this. Cheers, Shaun On 27 February 2014 17:13, Carson Holt wrote: Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson? Sent from my iPhone On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 6 16:38:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 16:38:48 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Well? not really. I have no plans to add est2genome support for noncoding genes (non-trivial), so you would either have to remove the ncRNA from your input, or filter it out downstream. Thanks, Carson From: Shaun Jackman Date: Thursday, March 6, 2014 at 4:33 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models? Cheers, Shaun On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote: > Yes. I?ll fix the naming. > > Thanks, > Carson > > > From: Shaun Jackman > Date: Thursday, March 6, 2014 at 1:56 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I agree that identifying non-coding RNA by homology in general is > a non-trivial problem. In my particular case, I have a well annotated > reference species that is very closely related (99.2% sequence identity), so > lifting over the annotations from that reference species to my species should > be pretty straight forward. It would be great if MAKER had an option for RNA > sequence homology similar to est2genome that does not imply the sequence is > coding. > > The integration of MAKER-P with tRNAscan is very useful. The identified genes > are named e.g. `trnascan-205522-processed-gene-0.38`. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names > with that convention? > > Cheers, > Shaun > > > On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote: >> >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were only >> added a couple of months ago). So just like other prediction tools (snap, >> augustus etc.), the primary focus has always been the coding genes. We?ve >> only started adding non-coding RNA support recently for iPlant, so it?s still >> relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for >> the tip. >> >> The rRNA genes that are found with est2genome have the feature type set to >> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. >> Ideally the feature type would be set to rRNA or tRNA as appropriate, and >> would omit the UTR and CDS features. Is that a feature that you would be >> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and >> the tRNA gene names with ?trn?, as is standard, so determining the >> appropriate type should be straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >>> Set single_exon=1, and the minimum size to a smaller value. I think it's >>> set to 250 right now. Also est2genome is looking for ORF, so if there is >>> none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>>> Sorry, ignore my previous question. est_forward also carries forward the >>>> names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>> these hits? >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> Cheers, >>>> Shaun >>>> >>>> >>>> >>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>> Is there a corresponding protein_forward=1 option to map forward protein >>>>> names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>> >>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>>> ) wrote: >>>>>> >>>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>>> passing the gff3 to model_gff. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and >>>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>>> in as model_gff and use the map_forward option to then filter the >>>>>>> results based on mRNA score and that would copy names onto new gene >>>>>>> under the standard MAKER pipeline. Eventually it?s really supposed to >>>>>>> go into a separate tool that will map genes onto new assemblies (but >>>>>>> under the hood the tool will just be calling MAKER with certain >>>>>>> parameters restricted). I do this because if people commonly use it >>>>>>> mixed with things like SNAP I can start to get some very weird >>>>>>> behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where >>>>>>> you have firm a priori knowledge of the placement of ESTs. However, >>>>>>> while trying it I note that est_forward implies that the est2genome >>>>>>> predictor is turned on, implicitly. Is this necessary for this to work? >>>>>>> I?m after the behavior you describe below where exonerate is made to try >>>>>>> really hard within a limited region to align an est, but I would not >>>>>>> like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set >>>>>>> that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Thu Mar 6 16:41:55 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Thu, 6 Mar 2014 23:41:55 +0000 Subject: [maker-devel] Long introns from Augustus Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08236@EXCHANGE-MB01.internal.solazyme.com> Hi, we have a very compact genome and we are getting a lot of fused gene models from running Augustus. I am wondering if anyone has any advice about how to prevent introns above a certain cutoff from being created? I tried a couple of things, some settings in a probabilities file and also changing a long list of probabilities to another file that someone had suggested on a forum. So far I don't really see any changes though. Any advice would be greatly appreciated. Thanks, Shane From carsonhh at gmail.com Thu Mar 6 16:46:53 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Mar 2014 16:46:53 -0700 Subject: [maker-devel] Long introns from Augustus Message-ID: Are these the ab intio calls that are merged or final MAKER models. ?Carson On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >Hi, we have a very compact genome and we are getting a lot of fused gene >models from running Augustus. I am wondering if anyone has any advice >about how to prevent introns above a certain cutoff from being created? > >I tried a couple of things, some settings in a probabilities file and >also changing a long list of probabilities to another file that someone >had suggested on a forum. So far I don't really see any changes though. > >Any advice would be greatly appreciated. > >Thanks, >Shane > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sbrubaker at solazyme.com Thu Mar 6 17:48:15 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 7 Mar 2014 00:48:15 +0000 Subject: [maker-devel] Long introns from Augustus In-Reply-To: References: Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, March 06, 2014 3:47 PM To: Shane Brubaker; maker-devel at yandell-lab.org Subject: Re: [maker-devel] Long introns from Augustus Are these the ab intio calls that are merged or final MAKER models. ?Carson On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >Hi, we have a very compact genome and we are getting a lot of fused >gene models from running Augustus. I am wondering if anyone has any >advice about how to prevent introns above a certain cutoff from being created? > >I tried a couple of things, some settings in a probabilities file and >also changing a long list of probabilities to another file that someone >had suggested on a forum. So far I don't really see any changes though. > >Any advice would be greatly appreciated. > >Thanks, >Shane > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Mon Mar 10 04:27:25 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 10:27:25 +0000 Subject: [maker-devel] keep_preds values Message-ID: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> Hi, Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either. Thanks, Mikael From robert.king at rothamsted.ac.uk Mon Mar 10 06:17:07 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Mon, 10 Mar 2014 12:17:07 +0000 Subject: [maker-devel] annotation comparison aed plots Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Dear Maker Developers, I've updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I've produced two annotations using as evidence: Annotation 1: Uniprot proteins search using species keyword "fusarium" Pubmed mRNA for the name of the organism Prior annotation reference transcripts Annotation 2: Uniprot proteins search using species keyword "fusarium" Pubmed mRNA for the name of the organism Prior annotation reference transcripts mRNA trinity assembly pasafly of different strain (only RNA-seq available) I'm not sure if it was a smart move to use the prior annotation reference transcripts? I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I've got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I'm waiting on some perl modules to be installed by our administrator to test out the "Evaluator" and "compare" programs too, what do they do? Best Wishes Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Mar 10 08:47:42 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 10 Mar 2014 14:47:42 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> References: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se> Message-ID: Hi Mikael, The keep_preds parameter is often used the same as a binary parameter, but it doesn't have to be. The concordance that is mentioned in the comment line is the AED for that prediction. AED is a measurement of how well a prediction is supported by the evidence and ranges from 0 - 1. A prediction with an AED of 0 matches the evidence exactly while a prediction with an AED of 1 isn't overlapped by any evidence. The default behavior for MAKER is to make a gene model out of a prediction with any AED <1. When you change the keep_preds option from 0 to 1, then MAKER will make a gene model out of any prediction that matches the other parameters (like single_exon, min_exon, etc). Setting the keep_preds option to somewhere in between 0 and 1 will set a ceiling on the AED required for promoting a prediction to a gene model. >From a user standpoint, when you will almost certainly lose gene models when you set AED at an intermediate value, but you might benefit by knowing that all your models will now have an AED of at least a certain value. I hope that helps; let me know if it didn't. ~Daniel PS The original paper that described the AED is Eilbeck et al in BMC Bioinformatics 2009. It's also discussed in more detail in the MAKER2 paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews Genetics paper from 2012. Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandstr?m Durling [mikael.durling at slu.se] Sent: Monday, March 10, 2014 4:27 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] keep_preds values Hi, Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either. Thanks, Mikael _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 10 09:51:21 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 08:51:21 -0700 Subject: [maker-devel] keep_preds values Message-ID: Actually that is false. The keep_preds option is still binary. Any value other than 0 sets it to true. There was discussion about making it a non-binary value, but that has not been implemented. ?Carson On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >Hi Mikael, > >The keep_preds parameter is often used the same as a binary parameter, >but it doesn't have to be. The concordance that is mentioned in the >comment line is the AED for that prediction. AED is a measurement of how >well a prediction is supported by the evidence and ranges from 0 - 1. A >prediction with an AED of 0 matches the evidence exactly while a >prediction with an AED of 1 isn't overlapped by any evidence. > >The default behavior for MAKER is to make a gene model out of a >prediction with any AED <1. When you change the keep_preds option from 0 >to 1, then MAKER will make a gene model out of any prediction that >matches the other parameters (like single_exon, min_exon, etc). Setting >the keep_preds option to somewhere in between 0 and 1 will set a ceiling >on the AED required for promoting a prediction to a gene model. > >From a user standpoint, when you will almost certainly lose gene models >when you set AED at an intermediate value, but you might benefit by >knowing that all your models will now have an AED of at least a certain >value. > >I hope that helps; let me know if it didn't. > >~Daniel > >PS The original paper that described the AED is Eilbeck et al in BMC >Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >Genetics paper from 2012. > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Mikael Brandstr?m Durling [mikael.durling at slu.se] >Sent: Monday, March 10, 2014 4:27 AM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] keep_preds values > >Hi, > >Can someone, please, explain the keep_preds parameter, as it works now >with a value between 1 and 0? It used to be binary, but now it seems to >test concordance towards something. The maker wiki doesn?t explain it any >further either. > >Thanks, >Mikael > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Mon Mar 10 08:57:23 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 14:57:23 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: Hi Carson and Daniel, That sounds more logical to me. Then it would be appropriate to change the comment of keep_preds in the generated config files. Would it make sense to make keep_preds a non-binary value to evaluate the concordance between ab initio models obtained from different predictors? That would assume that it is less likely to be a false positive when two or more predictors suggest the same unsported model? Mikael 10 mar 2014 kl. 16:51 skrev Carson Holt : > Actually that is false. The keep_preds option is still binary. Any value > other than 0 sets it to true. There was discussion about making it a > non-binary value, but that has not been implemented. > > ?Carson > > > On 3/10/14, 7:47 AM, "Daniel Ence" wrote: > >> Hi Mikael, >> >> The keep_preds parameter is often used the same as a binary parameter, >> but it doesn't have to be. The concordance that is mentioned in the >> comment line is the AED for that prediction. AED is a measurement of how >> well a prediction is supported by the evidence and ranges from 0 - 1. A >> prediction with an AED of 0 matches the evidence exactly while a >> prediction with an AED of 1 isn't overlapped by any evidence. >> >> The default behavior for MAKER is to make a gene model out of a >> prediction with any AED <1. When you change the keep_preds option from 0 >> to 1, then MAKER will make a gene model out of any prediction that >> matches the other parameters (like single_exon, min_exon, etc). Setting >> the keep_preds option to somewhere in between 0 and 1 will set a ceiling >> on the AED required for promoting a prediction to a gene model. >> >> From a user standpoint, when you will almost certainly lose gene models >> when you set AED at an intermediate value, but you might benefit by >> knowing that all your models will now have an AED of at least a certain >> value. >> >> I hope that helps; let me know if it didn't. >> >> ~Daniel >> >> PS The original paper that described the AED is Eilbeck et al in BMC >> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >> Genetics paper from 2012. >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >> Mikael Brandstr?m Durling [mikael.durling at slu.se] >> Sent: Monday, March 10, 2014 4:27 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] keep_preds values >> >> Hi, >> >> Can someone, please, explain the keep_preds parameter, as it works now >> with a value between 1 and 0? It used to be binary, but now it seems to >> test concordance towards something. The maker wiki doesn?t explain it any >> further either. >> >> Thanks, >> Mikael >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Mon Mar 10 09:59:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 08:59:43 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: Yes. It will eventually perform an AED like calculation between multiple predictors (i.e. if you use 3 predictors it, then you require support by at least 2 predictors across all exons to get a value of 0.33). A value of 0 would be perfect concordance across all 3 predictors. ?Carson On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" wrote: >Hi Carson and Daniel, > >That sounds more logical to me. Then it would be appropriate to change >the comment of keep_preds in the generated config files. > >Would it make sense to make keep_preds a non-binary value to evaluate the >concordance between ab initio models obtained from different predictors? >That would assume that it is less likely to be a false positive when two >or more predictors suggest the same unsported model? > >Mikael > > >10 mar 2014 kl. 16:51 skrev Carson Holt : > >> Actually that is false. The keep_preds option is still binary. Any >>value >> other than 0 sets it to true. There was discussion about making it a >> non-binary value, but that has not been implemented. >> >> ?Carson >> >> >> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >> >>> Hi Mikael, >>> >>> The keep_preds parameter is often used the same as a binary parameter, >>> but it doesn't have to be. The concordance that is mentioned in the >>> comment line is the AED for that prediction. AED is a measurement of >>>how >>> well a prediction is supported by the evidence and ranges from 0 - 1. A >>> prediction with an AED of 0 matches the evidence exactly while a >>> prediction with an AED of 1 isn't overlapped by any evidence. >>> >>> The default behavior for MAKER is to make a gene model out of a >>> prediction with any AED <1. When you change the keep_preds option from >>>0 >>> to 1, then MAKER will make a gene model out of any prediction that >>> matches the other parameters (like single_exon, min_exon, etc). Setting >>> the keep_preds option to somewhere in between 0 and 1 will set a >>>ceiling >>> on the AED required for promoting a prediction to a gene model. >>> >>> From a user standpoint, when you will almost certainly lose gene models >>> when you set AED at an intermediate value, but you might benefit by >>> knowing that all your models will now have an AED of at least a certain >>> value. >>> >>> I hope that helps; let me know if it didn't. >>> >>> ~Daniel >>> >>> PS The original paper that described the AED is Eilbeck et al in BMC >>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>> Genetics paper from 2012. >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>> Sent: Monday, March 10, 2014 4:27 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] keep_preds values >>> >>> Hi, >>> >>> Can someone, please, explain the keep_preds parameter, as it works now >>> with a value between 1 and 0? It used to be binary, but now it seems to >>> test concordance towards something. The maker wiki doesn?t explain it >>>any >>> further either. >>> >>> Thanks, >>> Mikael >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From mikael.durling at slu.se Mon Mar 10 09:08:16 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Mon, 10 Mar 2014 15:08:16 +0000 Subject: [maker-devel] keep_preds values In-Reply-To: References: Message-ID: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Ok. But that is not implemented no as far as I can tell from the source, right? Or is it reflected in the AED for the unsupported models? Mikael 10 mar 2014 kl. 16:59 skrev Carson Holt : > Yes. It will eventually perform an AED like calculation between multiple > predictors (i.e. if you use 3 predictors it, then you require support by > at least 2 predictors across all exons to get a value of 0.33). A value > of 0 would be perfect concordance across all 3 predictors. > > ?Carson > > > > > On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" > wrote: > >> Hi Carson and Daniel, >> >> That sounds more logical to me. Then it would be appropriate to change >> the comment of keep_preds in the generated config files. >> >> Would it make sense to make keep_preds a non-binary value to evaluate the >> concordance between ab initio models obtained from different predictors? >> That would assume that it is less likely to be a false positive when two >> or more predictors suggest the same unsported model? >> >> Mikael >> >> >> 10 mar 2014 kl. 16:51 skrev Carson Holt : >> >>> Actually that is false. The keep_preds option is still binary. Any >>> value >>> other than 0 sets it to true. There was discussion about making it a >>> non-binary value, but that has not been implemented. >>> >>> ?Carson >>> >>> >>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>> >>>> Hi Mikael, >>>> >>>> The keep_preds parameter is often used the same as a binary parameter, >>>> but it doesn't have to be. The concordance that is mentioned in the >>>> comment line is the AED for that prediction. AED is a measurement of >>>> how >>>> well a prediction is supported by the evidence and ranges from 0 - 1. A >>>> prediction with an AED of 0 matches the evidence exactly while a >>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>> >>>> The default behavior for MAKER is to make a gene model out of a >>>> prediction with any AED <1. When you change the keep_preds option from >>>> 0 >>>> to 1, then MAKER will make a gene model out of any prediction that >>>> matches the other parameters (like single_exon, min_exon, etc). Setting >>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>> ceiling >>>> on the AED required for promoting a prediction to a gene model. >>>> >>>> From a user standpoint, when you will almost certainly lose gene models >>>> when you set AED at an intermediate value, but you might benefit by >>>> knowing that all your models will now have an AED of at least a certain >>>> value. >>>> >>>> I hope that helps; let me know if it didn't. >>>> >>>> ~Daniel >>>> >>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>> Genetics paper from 2012. >>>> >>>> Daniel Ence >>>> Graduate Student >>>> Eccles Institute of Human Genetics >>>> University of Utah >>>> 15 North 2030 East, Room 2100 >>>> Salt Lake City, UT 84112-5330 >>>> ________________________________________ >>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>> Sent: Monday, March 10, 2014 4:27 AM >>>> To: maker-devel at yandell-lab.org >>>> Subject: [maker-devel] keep_preds values >>>> >>>> Hi, >>>> >>>> Can someone, please, explain the keep_preds parameter, as it works now >>>> with a value between 1 and 0? It used to be binary, but now it seems to >>>> test concordance towards something. The maker wiki doesn?t explain it >>>> any >>>> further either. >>>> >>>> Thanks, >>>> Mikael >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > From carsonhh at gmail.com Mon Mar 10 10:16:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:16:59 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> References: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Message-ID: There is a value called abAED being calculated, which somewhat captures the concordance among the predictors. It is not currently printed in the GFF3, but it is used to identify the best non-overlapping ab initio predictor to put in the non-overlapping fasta file. There are a couple of things I still need to do with it to though. It?s not yet normalized to take into account the absence of a predictor in the cluster of overlapping predictions. For example, if I have 2 predictors and 2 make perfectly matching calls and 1 makes no call, they get a score of 0 before I have perfect concordance between what?s there, but I really should make it 0.33 because the abscence of the third predictor is meaningful. The unnormalized concordance value is fine for deciding which overlapping model to keep in the file, but not for global comparison. ?Carson On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" wrote: >Ok. But that is not implemented no as far as I can tell from the source, >right? Or is it reflected in the AED for the unsupported models? > >Mikael > >10 mar 2014 kl. 16:59 skrev Carson Holt : > >> Yes. It will eventually perform an AED like calculation between >>multiple >> predictors (i.e. if you use 3 predictors it, then you require support by >> at least 2 predictors across all exons to get a value of 0.33). A value >> of 0 would be perfect concordance across all 3 predictors. >> >> ?Carson >> >> >> >> >> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" >> wrote: >> >>> Hi Carson and Daniel, >>> >>> That sounds more logical to me. Then it would be appropriate to change >>> the comment of keep_preds in the generated config files. >>> >>> Would it make sense to make keep_preds a non-binary value to evaluate >>>the >>> concordance between ab initio models obtained from different >>>predictors? >>> That would assume that it is less likely to be a false positive when >>>two >>> or more predictors suggest the same unsported model? >>> >>> Mikael >>> >>> >>> 10 mar 2014 kl. 16:51 skrev Carson Holt : >>> >>>> Actually that is false. The keep_preds option is still binary. Any >>>> value >>>> other than 0 sets it to true. There was discussion about making it a >>>> non-binary value, but that has not been implemented. >>>> >>>> ?Carson >>>> >>>> >>>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>>> >>>>> Hi Mikael, >>>>> >>>>> The keep_preds parameter is often used the same as a binary >>>>>parameter, >>>>> but it doesn't have to be. The concordance that is mentioned in the >>>>> comment line is the AED for that prediction. AED is a measurement of >>>>> how >>>>> well a prediction is supported by the evidence and ranges from 0 - >>>>>1. A >>>>> prediction with an AED of 0 matches the evidence exactly while a >>>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>>> >>>>> The default behavior for MAKER is to make a gene model out of a >>>>> prediction with any AED <1. When you change the keep_preds option >>>>>from >>>>> 0 >>>>> to 1, then MAKER will make a gene model out of any prediction that >>>>> matches the other parameters (like single_exon, min_exon, etc). >>>>>Setting >>>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>>> ceiling >>>>> on the AED required for promoting a prediction to a gene model. >>>>> >>>>> From a user standpoint, when you will almost certainly lose gene >>>>>models >>>>> when you set AED at an intermediate value, but you might benefit by >>>>> knowing that all your models will now have an AED of at least a >>>>>certain >>>>> value. >>>>> >>>>> I hope that helps; let me know if it didn't. >>>>> >>>>> ~Daniel >>>>> >>>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2 >>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>>> Genetics paper from 2012. >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>>> Sent: Monday, March 10, 2014 4:27 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] keep_preds values >>>>> >>>>> Hi, >>>>> >>>>> Can someone, please, explain the keep_preds parameter, as it works >>>>>now >>>>> with a value between 1 and 0? It used to be binary, but now it seems >>>>>to >>>>> test concordance towards something. The maker wiki doesn?t explain it >>>>> any >>>>> further either. >>>>> >>>>> Thanks, >>>>> Mikael >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>> >> >> > From carsonhh at gmail.com Mon Mar 10 10:18:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:18:14 -0700 Subject: [maker-devel] keep_preds values In-Reply-To: References: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se> Message-ID: Sorry meant to say "3 predictors and 2 make perfectly matching calls and 1 makes no call." On 3/10/14, 9:16 AM, "Carson Holt" wrote: >There is a value called abAED being calculated, which somewhat captures >the concordance among the predictors. It is not currently printed in the >GFF3, but it is used to identify the best non-overlapping ab initio >predictor to put in the non-overlapping fasta file. There are a couple of >things I still need to do with it to though. It?s not yet normalized to >take into account the absence of a predictor in the cluster of overlapping >predictions. For example, if I have 2 predictors and 2 make perfectly >matching calls and 1 makes no call, they get a score of 0 before I have >perfect concordance between what?s there, but I really should make it 0.33 >because the abscence of the third predictor is meaningful. The >unnormalized concordance value is fine for deciding which overlapping >model to keep in the file, but not for global comparison. > >?Carson > > > >On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" >wrote: > >>Ok. But that is not implemented no as far as I can tell from the source, >>right? Or is it reflected in the AED for the unsupported models? >> >>Mikael >> >>10 mar 2014 kl. 16:59 skrev Carson Holt : >> >>> Yes. It will eventually perform an AED like calculation between >>>multiple >>> predictors (i.e. if you use 3 predictors it, then you require support >>>by >>> at least 2 predictors across all exons to get a value of 0.33). A >>>value >>> of 0 would be perfect concordance across all 3 predictors. >>> >>> ?Carson >>> >>> >>> >>> >>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" >>> >>> wrote: >>> >>>> Hi Carson and Daniel, >>>> >>>> That sounds more logical to me. Then it would be appropriate to >>>>change >>>> the comment of keep_preds in the generated config files. >>>> >>>> Would it make sense to make keep_preds a non-binary value to evaluate >>>>the >>>> concordance between ab initio models obtained from different >>>>predictors? >>>> That would assume that it is less likely to be a false positive when >>>>two >>>> or more predictors suggest the same unsported model? >>>> >>>> Mikael >>>> >>>> >>>> 10 mar 2014 kl. 16:51 skrev Carson Holt : >>>> >>>>> Actually that is false. The keep_preds option is still binary. Any >>>>> value >>>>> other than 0 sets it to true. There was discussion about making it a >>>>> non-binary value, but that has not been implemented. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> On 3/10/14, 7:47 AM, "Daniel Ence" wrote: >>>>> >>>>>> Hi Mikael, >>>>>> >>>>>> The keep_preds parameter is often used the same as a binary >>>>>>parameter, >>>>>> but it doesn't have to be. The concordance that is mentioned in the >>>>>> comment line is the AED for that prediction. AED is a measurement of >>>>>> how >>>>>> well a prediction is supported by the evidence and ranges from 0 - >>>>>>1. A >>>>>> prediction with an AED of 0 matches the evidence exactly while a >>>>>> prediction with an AED of 1 isn't overlapped by any evidence. >>>>>> >>>>>> The default behavior for MAKER is to make a gene model out of a >>>>>> prediction with any AED <1. When you change the keep_preds option >>>>>>from >>>>>> 0 >>>>>> to 1, then MAKER will make a gene model out of any prediction that >>>>>> matches the other parameters (like single_exon, min_exon, etc). >>>>>>Setting >>>>>> the keep_preds option to somewhere in between 0 and 1 will set a >>>>>> ceiling >>>>>> on the AED required for promoting a prediction to a gene model. >>>>>> >>>>>> From a user standpoint, when you will almost certainly lose gene >>>>>>models >>>>>> when you set AED at an intermediate value, but you might benefit by >>>>>> knowing that all your models will now have an AED of at least a >>>>>>certain >>>>>> value. >>>>>> >>>>>> I hope that helps; let me know if it didn't. >>>>>> >>>>>> ~Daniel >>>>>> >>>>>> PS The original paper that described the AED is Eilbeck et al in BMC >>>>>> Bioinformatics 2009. It's also discussed in more detail in the >>>>>>MAKER2 >>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews >>>>>> Genetics paper from 2012. >>>>>> >>>>>> Daniel Ence >>>>>> Graduate Student >>>>>> Eccles Institute of Human Genetics >>>>>> University of Utah >>>>>> 15 North 2030 East, Room 2100 >>>>>> Salt Lake City, UT 84112-5330 >>>>>> ________________________________________ >>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se] >>>>>> Sent: Monday, March 10, 2014 4:27 AM >>>>>> To: maker-devel at yandell-lab.org >>>>>> Subject: [maker-devel] keep_preds values >>>>>> >>>>>> Hi, >>>>>> >>>>>> Can someone, please, explain the keep_preds parameter, as it works >>>>>>now >>>>>> with a value between 1 and 0? It used to be binary, but now it seems >>>>>>to >>>>>> test concordance towards something. The maker wiki doesn?t explain >>>>>>it >>>>>> any >>>>>> further either. >>>>>> >>>>>> Thanks, >>>>>> Mikael >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>> >>>>> >>>> >>> >>> >> > > From carsonhh at gmail.com Mon Mar 10 10:25:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 09:25:50 -0700 Subject: [maker-devel] annotation comparison aed plots Message-ID: I don?t know about Michaels?s script, but I?ve always used eval. It produces sensitivity/specificity metrics. It assumes the first models are 100% correct, and then tells you the sensitivity/specificity value for the second models. It is not therefor a quality metric. Instead you should view it as a change metric. Lower sensitivity tells you that models/exons have been lost between versions, and lower specificity tells you models/exons have been gained. There will also be a lost of generic statistics on exon/intron distribution and UTR length. Then the AED values from the MAEKR run can be used independently to evaluate how well models match the evidence. ?Carson From: "Robert King (RRes-Roth)" Date: Monday, March 10, 2014 at 5:17 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] annotation comparison aed plots Dear Maker Developers, I?ve updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I?ve produced two annotations using as evidence: Annotation 1: Uniprot proteins search using species keyword ?fusarium? Pubmed mRNA for the name of the organism Prior annotation reference transcripts Annotation 2: Uniprot proteins search using species keyword ?fusarium? Pubmed mRNA for the name of the organism Prior annotation reference transcripts mRNA trinity assembly pasafly of different strain (only RNA-seq available) I?m not sure if it was a smart move to use the prior annotation reference transcripts? I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I?ve got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I?m waiting on some perl modules to be installed by our administrator to test out the ?Evaluator? and ?compare? programs too, what do they do? Best Wishes Rob -- This message has been scanned for viruses and dangerous content by MailScanner , and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Mar 10 09:50:53 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 10 Mar 2014 09:50:53 -0600 Subject: [maker-devel] annotation comparison aed plots In-Reply-To: References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Message-ID: One more point. The sensitivity, specificity,and accuracy produced by the compare_annotations_3.2.pl script are gene level, and overlap is defined very liberally between annotation sets is defined as at least one nucleotide of an exon overlap. Mike On Mon, Mar 10, 2014 at 9:47 AM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Robert, > > Here are the scripts that were mentioned before. > > The AED_cdf_generator.pl script is for making cumulative distribution > function plots based on annotation edit distance. This script is quite > simple and strait forward in its internals. > > The compare_annotations_3.2.pl script is for generating summary stats for > annotations and will compare two annotations of the same assembly. > > You can run either script without arguments to get a usage statement. > > Thanks, > Mike > > > On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) < > robert.king at rothamsted.ac.uk> wrote: > >> Dear Maker Developers, >> >> >> >> I've updated a reference that was had errors and was a little incomplete >> and now trying to produce a annotation for it. Please note the reference >> has not changed dramatically. I've produced two annotations using as >> evidence: >> >> >> >> Annotation 1: >> >> Uniprot proteins search using species keyword "fusarium" >> >> Pubmed mRNA for the name of the organism >> >> Prior annotation reference transcripts >> >> >> >> Annotation 2: >> >> Uniprot proteins search using species keyword "fusarium" >> >> Pubmed mRNA for the name of the organism >> >> Prior annotation reference transcripts >> >> mRNA trinity assembly pasafly of different strain (only RNA-seq available) >> >> >> >> I'm not sure if it was a smart move to use the prior annotation reference >> transcripts? >> >> >> >> I want to compare these two annotations and have produced AED scores. How >> do I generate summary stats/figures to compare annotations. You mentioned >> last year in a post Mike Campbell has a script to produce these, do you >> know if he will post it? I've got the Eval program and converted to gtf >> format using the provided script, just waiting on some perl modules to be >> installed by admin to test it. I'm waiting on some perl modules to be >> installed by our administrator to test out the "Evaluator" and "compare" >> programs too, what do they do? >> >> >> >> Best Wishes >> >> Rob >> >> -- >> This message has been scanned for viruses and >> dangerous content by *MailScanner* , and >> we believe but do not warrant that this e-mail and any attachments >> thereto do not contain any viruses. However, you are fully responsible for >> performing any virus scanning. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Mon Mar 10 09:52:50 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 10 Mar 2014 15:52:50 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) Message-ID: I have been running MAKER 2.31 using Augustus and SNAP on an avian genome. Augustus gives pretty decent gene model predictions based on a custom model we have and the hints MAKER provides. However, SNAP seems to throw out a ton of false positives; in many cases this appears to cause erroneous gene fusions. Leaving out SNAP altogether however leads to a marked decrease in # models overall, which is worse. GeneMark had a very similar problem (high # false positives) and thus no marked improvement, either when using with both Augustus and SNAP or with Augustus alone. I have been exploring using geneid (http://genome.crg.es/software/geneid/) as an alternative, based on some feedback on another project I worked with int he past. This would be feed into MAKER using external GFF, but I wanted to see if anyone has tried geneid with MAKER first. Finally, how hard would it be to incorporate alternative callers into MAKER? For instance, would it be possible to add these like a ?plugin?? chris From carsonhh at gmail.com Mon Mar 10 11:05:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 10:05:24 -0700 Subject: [maker-devel] geneid (or alternative ab initio predictors) Message-ID: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Mon Mar 10 09:47:50 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 10 Mar 2014 09:47:50 -0600 Subject: [maker-devel] annotation comparison aed plots In-Reply-To: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk> Message-ID: Hi Robert, Here are the scripts that were mentioned before. The AED_cdf_generator.pl script is for making cumulative distribution function plots based on annotation edit distance. This script is quite simple and strait forward in its internals. The compare_annotations_3.2.pl script is for generating summary stats for annotations and will compare two annotations of the same assembly. You can run either script without arguments to get a usage statement. Thanks, Mike On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) < robert.king at rothamsted.ac.uk> wrote: > Dear Maker Developers, > > > > I've updated a reference that was had errors and was a little incomplete > and now trying to produce a annotation for it. Please note the reference > has not changed dramatically. I've produced two annotations using as > evidence: > > > > Annotation 1: > > Uniprot proteins search using species keyword "fusarium" > > Pubmed mRNA for the name of the organism > > Prior annotation reference transcripts > > > > Annotation 2: > > Uniprot proteins search using species keyword "fusarium" > > Pubmed mRNA for the name of the organism > > Prior annotation reference transcripts > > mRNA trinity assembly pasafly of different strain (only RNA-seq available) > > > > I'm not sure if it was a smart move to use the prior annotation reference > transcripts? > > > > I want to compare these two annotations and have produced AED scores. How > do I generate summary stats/figures to compare annotations. You mentioned > last year in a post Mike Campbell has a script to produce these, do you > know if he will post it? I've got the Eval program and converted to gtf > format using the provided script, just waiting on some perl modules to be > installed by admin to test it. I'm waiting on some perl modules to be > installed by our administrator to test out the "Evaluator" and "compare" > programs too, what do they do? > > > > Best Wishes > > Rob > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and > we believe but do not warrant that this e-mail and any attachments thereto > do not contain any viruses. However, you are fully responsible for > performing any virus scanning. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AED_cdf_generator.pl Type: text/x-perl-script Size: 2580 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: compare_annotations_3.2.pl Type: text/x-perl-script Size: 29155 bytes Desc: not available URL: From sajeet at gmail.com Mon Mar 10 12:31:40 2014 From: sajeet at gmail.com (Sajeet Haridas) Date: Mon, 10 Mar 2014 11:31:40 -0700 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt wrote: > Adding a new predictor can take some time. It obviously requires some > coding. It's usually not too hard just to convert results to GFF3 and > then pass it in. Integrated support is really only beneficial for > predictors that can take "hints" from evidence alignments (for example we > are working on EVM integration right now - > http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give > problems just drop them. GeneMark really doesn't work very good on > genomes with complex intron/exon structure (and I really wouldn't use it > for anything but fungi). > > Make sure you are also giving sufficient protein evidence. Perhaps all > proteins from chicken and pigeon for example. Then you shouldn't find > loss of any true genes if just using Augustus. Also try not to use gene > count as an indicator of performance. The value is very deceptive, > especially if the genome assembly is fragmented. > > Thanks, > Carson > > > > On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: > > >I have been running MAKER 2.31 using Augustus and SNAP on an avian > >genome. Augustus gives pretty decent gene model predictions based on a > >custom model we have and the hints MAKER provides. However, SNAP seems > >to throw out a ton of false positives; in many cases this appears to > >cause erroneous gene fusions. Leaving out SNAP altogether however leads > >to a marked decrease in # models overall, which is worse. GeneMark had a > >very similar problem (high # false positives) and thus no marked > >improvement, either when using with both Augustus and SNAP or with > >Augustus alone. > > > >I have been exploring using geneid > >(http://genome.crg.es/software/geneid/) as an alternative, based on some > >feedback on another project I worked with int he past. This would be > >feed into MAKER using external GFF, but I wanted to see if anyone has > >tried geneid with MAKER first. > > > >Finally, how hard would it be to incorporate alternative callers into > >MAKER? For instance, would it be possible to add these like a 'plugin'? > > > >chris > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 10 22:13:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Mar 2014 22:13:43 -0600 Subject: [maker-devel] Long introns from Augustus In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> References: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> Message-ID: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> Maybe. The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes. --Carson Sent from my iPhone > On Mar 6, 2014, at 5:48 PM, Shane Brubaker wrote: > > Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. > > I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. > > > -----Original Message----- > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Thursday, March 06, 2014 3:47 PM > To: Shane Brubaker; maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Long introns from Augustus > > Are these the ab intio calls that are merged or final MAKER models. > > ?Carson > > >> On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >> >> Hi, we have a very compact genome and we are getting a lot of fused >> gene models from running Augustus. I am wondering if anyone has any >> advice about how to prevent introns above a certain cutoff from being created? >> >> I tried a couple of things, some settings in a probabilities file and >> also changing a long list of probabilities to another file that someone >> had suggested on a forum. So far I don't really see any changes though. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From darasappan at gmail.com Mon Mar 10 14:14:03 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 10 Mar 2014 15:14:03 -0500 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing Message-ID: Hello, I've been running maker with different assembly files, reference files etc and I check the output by: 1. concatenating the gff files 2. concatenating the *transcripts.fasta files 3. concatenating the *proteins.fasta files I'm noticing that when I ran maker twice with same parameters, the second time around, many of the output subdirectories do not have a *transcripts.fasta or *proteins.fasta file in it. There are 251 subdirectories and only 97 of them have all 3 output files. Maker log looks ok to me, but I've attached it here as well. What could be the reason for this? Thanks dhivya -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.o1813247.gz Type: application/x-gzip Size: 13857217 bytes Desc: not available URL: -------------- next part -------------- From sbrubaker at solazyme.com Tue Mar 11 11:06:57 2014 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Tue, 11 Mar 2014 17:06:57 +0000 Subject: [maker-devel] Long introns from Augustus In-Reply-To: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> References: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com> <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com> Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08FB3@EXCHANGE-MB01.internal.solazyme.com> Ok thank you. -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Monday, March 10, 2014 9:14 PM To: Shane Brubaker Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Long introns from Augustus Maybe. The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes. --Carson Sent from my iPhone > On Mar 6, 2014, at 5:48 PM, Shane Brubaker wrote: > > Actually these are calls directly from Augustus (without using Maker). They are not purely ab initio in that they are using hints from RNA-Seq data. > > I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker? I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence. > > > -----Original Message----- > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Thursday, March 06, 2014 3:47 PM > To: Shane Brubaker; maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Long introns from Augustus > > Are these the ab intio calls that are merged or final MAKER models. > > ?Carson > > >> On 3/6/14, 4:41 PM, "Shane Brubaker" wrote: >> >> Hi, we have a very compact genome and we are getting a lot of fused >> gene models from running Augustus. I am wondering if anyone has any >> advice about how to prevent introns above a certain cutoff from being created? >> >> I tried a couple of things, some settings in a probabilities file and >> also changing a long list of probabilities to another file that >> someone had suggested on a forum. So far I don't really see any changes though. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >> rg > > From carson.holt at genetics.utah.edu Thu Mar 13 10:00:06 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 13 Mar 2014 16:00:06 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: Just resending this to the correct maker-devel address. Please when replying, do not CC the incorrect maker-devel-bounce address. Thanks, Carson On 3/13/14, 9:56 AM, "Carson Holt" wrote: >FGENESH is not a heavily used tool, so depending on which version it is >(either too old or too new), output might be slightly different which >could cause incorrect parsing. Could you tar up your maker.output folder, >and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >(send me either your user/guest ID after you upload). > >For the BLAST error, use BLAST+ instead. You are using blastall which is >the old legacy version of NCBI BLAST. You can do this by setting the >blast type in maker_bopts.ctl and the location of executables in >maker_exe.ctl. > >Thanks, >Carson > > > >On 3/12/14, 11:58 AM, "Borhan, Hossein" wrote: > >>Dear Maker users >> >> >>I ran maker (2.31) on a fungal genome and found out that it inserted the >>word SCLAR followed by a pair of bracket like this (0x22de7020) >>inserted in the nucleotide sequence of some of the genes. This seems to >>be related to transcripts predicted by fgenesh_masked. >> >> >>Here is an example for one of the genes >> >> >>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>offset:0 AE >>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>TTTCGACAAGC >> >>The same genome sequence was used for the first round of maker (2.10) >>without such problem. I checked the sequence for the scaffold related to >>one of the affected transcripts and there was no error in the sequence. >>I am not sure what is causing this. The only error that I could spot in >>the output error file is the following >> >> >>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>contexts/frames of query sequences. >> >> >> >>Your help is appreciated >> >> >> >>HB >> >> >> >> >> >> > From carsonhh at gmail.com Thu Mar 13 10:14:54 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 10:14:54 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> Message-ID: Note protein/transcript fasts are only created when there are gene models to output to those files (so their absence means there were no gene models for that contig). Most sequences without protein/transcript fasts in your sample are very short and thus don?t contain anything. What is left either have no est2genome results or the est2genome alignments do not have sufficient open reading frame to be turned into a gene model (false merging of regions by trinity can cause this, so make sure you use the jaccard index option when assembling reads with trinity to avoid this). You are using only the est2genome=1 option. This will result in a limited set of genes that can be used for training SNAP/Augustus (so not getting results on all contigs is expected). You really won?t get much as far as results until you have one of the ab initio predictors turned on. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 11, 2014 at 8:52 AM To: Carson Holt Cc: Daniel Ence Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing Alright done. My username is daras Thanks Dhivya On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > Input and compressed file of output. > > Thanks, > Carson > > From: dhivya arasappan > Date: Monday, March 10, 2014 at 2:09 PM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Hi Carson, > > Do you mean the whole maker output? > > Thanks > dhivya > > On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: > >> Could you upload everything here ?> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> Than send us the link generated or your user ID. >> >> Thanks, >> Carson >> >> >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 1:50 PM >> To: Carson Holt , Daniel Ence >> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson and Daniel, >> >> I'm sending this across to you separately since maker list is blocking my >> email due to attachment size. >> >> As always, thanks for any guidance you can provide. >> Dhivya >> >> >> Begin forwarded message: >> >>> From: dhivya arasappan >>> Date: March 10, 2014 3:14:03 PM CDT >>> To: maker-devel at yandell-lab.org >>> Subject: maker output- transcripts.fasta and proteins.fasta files missing >>> >>> >>> Hello, >>> >>> I've been running maker with different assembly files, reference files etc >>> and I check the output by: >>> >>> 1. concatenating the gff files >>> 2. concatenating the *transcripts.fasta files >>> 3. concatenating the *proteins.fasta files >>> >>> I'm noticing that when I ran maker twice with same parameters, the second >>> time around, many of the output subdirectories do not have a >>> *transcripts.fasta or *proteins.fasta file in it. >>> There are 251 subdirectories and only 97 of them have all 3 output files. >>> Maker log looks ok to me, but I've attached it here as well. >>> >>> What could be the reason for this? >>> >>> Thanks >>> dhivya >>> >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 13 10:55:40 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 10:55:40 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> Message-ID: The second time, it should have just started where it left off, so it would run faster (because the processing from the previous job counted towards the second one). The archived output you sent me had 21,183 proteins and transcripts. If you are using the fasta_merge to collect them, just make sure the datastore.index file is not truncated or corrupt otherwise it won?t collect all the fastas from every contig. You can rebuild the datastore.index using the -dsindex flag with MAKER, if you want to check that. Also you can have maker just regenerate results without rerunning BLAST etc., by using the -a flag if you want to just recalculate ll results quickly (rebuilds all FASTA and GFF3 without redoing most analysis). ?Carson From: dhivya arasappan Date: Thursday, March 13, 2014 at 10:47 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing Thanks Carson for the response. I understand that est2genome=1 does not use any ab initio gene predictions, but simply identifies ests based on alignment. I'm a little confused because I ran maker on my assembly before, using the same parameters ( including est2genome=1). I got a very good result with > 20,000 transcripts and proteins. Then I was able to get an improved assembly, where many scaffolds were combined into superscaffolds. So I reran maker on this assembly. Same parameters, same transcriptome and proteins files. Now, I see such drastically different results: Only 500+ genes and transcripts. My scaffolds are now bigger than before, so I'm not sure how this is happening. These were the results I sent you. Another odd thing I noticed (and I am hesitant to report this because perhaps it is due to some sort of error on my part): I ran maker on the improved assembly the first time and maker did not complete in the 48 hours I allocated. But I had 19,000+ transcripts in the unfinished output. When I reran maker, just changing the time allocated, it completed much faster, but is giving much fewer transcripts and proteins as output. Could something like this happen? If not, then I'm guessing I must have changed something although I'm pretty sure that I did not change anything other than the time allocated. I've attached the trascripts and proteins files from the first time I ran maker on my improved assembly. Thanks again for your help Dhivya On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: > Note protein/transcript fasts are only created when there are gene models to > output to those files (so their absence means there were no gene models for > that contig). Most sequences without protein/transcript fasts in your sample > are very short and thus don?t contain anything. What is left either have no > est2genome results or the est2genome alignments do not have sufficient open > reading frame to be turned into a gene model (false merging of regions by > trinity can cause this, so make sure you use the jaccard index option when > assembling reads with trinity to avoid this). > > You are using only the est2genome=1 option. This will result in a limited set > of genes that can be used for training SNAP/Augustus (so not getting results > on all contigs is expected). You really won?t get much as far as results > until you have one of the ab initio predictors turned on. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Tuesday, March 11, 2014 at 8:52 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Alright done. My username is daras > > Thanks > Dhivya > > On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > >> Input and compressed file of output. >> >> Thanks, >> Carson >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 2:09 PM >> To: Carson Holt >> Cc: Daniel Ence >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson, >> >> Do you mean the whole maker output? >> >> Thanks >> dhivya >> >> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >> >>> Could you upload everything here ?> >>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Than send us the link generated or your user ID. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: dhivya arasappan >>> Date: Monday, March 10, 2014 at 1:50 PM >>> To: Carson Holt , Daniel Ence >>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >>> missing >>> >>> Hi Carson and Daniel, >>> >>> I'm sending this across to you separately since maker list is blocking my >>> email due to attachment size. >>> >>> As always, thanks for any guidance you can provide. >>> Dhivya >>> >>> >>> Begin forwarded message: >>> >>>> From: dhivya arasappan >>>> Date: March 10, 2014 3:14:03 PM CDT >>>> To: maker-devel at yandell-lab.org >>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing >>>> >>>> >>>> Hello, >>>> >>>> I've been running maker with different assembly files, reference files etc >>>> and I check the output by: >>>> >>>> 1. concatenating the gff files >>>> 2. concatenating the *transcripts.fasta files >>>> 3. concatenating the *proteins.fasta files >>>> >>>> I'm noticing that when I ran maker twice with same parameters, the second >>>> time around, many of the output subdirectories do not have a >>>> *transcripts.fasta or *proteins.fasta file in it. >>>> There are 251 subdirectories and only 97 of them have all 3 output files. >>>> Maker log looks ok to me, but I've attached it here as well. >>>> >>>> What could be the reason for this? >>>> >>>> Thanks >>>> dhivya >>>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Mar 13 10:47:25 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 13 Mar 2014 11:47:25 -0500 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> Message-ID: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> Thanks Carson for the response. I understand that est2genome=1 does not use any ab initio gene predictions, but simply identifies ests based on alignment. I'm a little confused because I ran maker on my assembly before, using the same parameters ( including est2genome=1). I got a very good result with > 20,000 transcripts and proteins. Then I was able to get an improved assembly, where many scaffolds were combined into superscaffolds. So I reran maker on this assembly. Same parameters, same transcriptome and proteins files. Now, I see such drastically different results: Only 500+ genes and transcripts. My scaffolds are now bigger than before, so I'm not sure how this is happening. These were the results I sent you. Another odd thing I noticed (and I am hesitant to report this because perhaps it is due to some sort of error on my part): I ran maker on the improved assembly the first time and maker did not complete in the 48 hours I allocated. But I had 19,000+ transcripts in the unfinished output. When I reran maker, just changing the time allocated, it completed much faster, but is giving much fewer transcripts and proteins as output. Could something like this happen? If not, then I'm guessing I must have changed something although I'm pretty sure that I did not change anything other than the time allocated. I've attached the trascripts and proteins files from the first time I ran maker on my improved assembly. Thanks again for your help Dhivya On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: > Note protein/transcript fasts are only created when there are gene > models to output to those files (so their absence means there were > no gene models for that contig). Most sequences without protein/ > transcript fasts in your sample are very short and thus don?t > contain anything. What is left either have no est2genome results or > the est2genome alignments do not have sufficient open reading frame > to be turned into a gene model (false merging of regions by trinity > can cause this, so make sure you use the jaccard index option when > assembling reads with trinity to avoid this). > > You are using only the est2genome=1 option. This will result in a > limited set of genes that can be used for training SNAP/Augustus (so > not getting results on all contigs is expected). You really won?t > get much as far as results until you have one of the ab initio > predictors turned on. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Tuesday, March 11, 2014 at 8:52 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: maker output- transcripts.fasta and proteins.fasta > files missing > > Alright done. My username is daras > > Thanks > Dhivya > > On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: > >> Input and compressed file of output. >> >> Thanks, >> Carson >> >> From: dhivya arasappan >> Date: Monday, March 10, 2014 at 2:09 PM >> To: Carson Holt >> Cc: Daniel Ence >> Subject: Re: maker output- transcripts.fasta and proteins.fasta >> files missing >> >> Hi Carson, >> >> Do you mean the whole maker output? >> >> Thanks >> dhivya >> >> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >> >>> Could you upload everything here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Than send us the link generated or your user ID. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: dhivya arasappan >>> Date: Monday, March 10, 2014 at 1:50 PM >>> To: Carson Holt , Daniel Ence >> > >>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta >>> files missing >>> >>> Hi Carson and Daniel, >>> >>> I'm sending this across to you separately since maker list is >>> blocking my email due to attachment size. >>> >>> As always, thanks for any guidance you can provide. >>> Dhivya >>> >>> >>> Begin forwarded message: >>> >>>> From: dhivya arasappan >>>> Date: March 10, 2014 3:14:03 PM CDT >>>> To: maker-devel at yandell-lab.org >>>> Subject: maker output- transcripts.fasta and proteins.fasta files >>>> missing >>>> >>>> Hello, >>>> >>>> I've been running maker with different assembly files, reference >>>> files etc and I check the output by: >>>> >>>> 1. concatenating the gff files >>>> 2. concatenating the *transcripts.fasta files >>>> 3. concatenating the *proteins.fasta files >>>> >>>> I'm noticing that when I ran maker twice with same parameters, >>>> the second time around, many of the output subdirectories do not >>>> have a *transcripts.fasta or *proteins.fasta file in it. >>>> There are 251 subdirectories and only 97 of them have all 3 >>>> output files. Maker log looks ok to me, but I've attached it >>>> here as well. >>>> >>>> What could be the reason for this? >>>> >>>> Thanks >>>> dhivya >>>> >>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: transcripts.cat.fasta.old.gz Type: application/x-gzip Size: 7927581 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proteins.cat.fasta.old.gz Type: application/x-gzip Size: 3668381 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 13 12:53:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Mar 2014 12:53:05 -0600 Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta files missing In-Reply-To: References: <64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com> <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com> <672A27A2-FFBD-45EC-9303-E3973EEA5AB6@gmail.com> <5EE3B5E8-E7DC-4F09-B52D-E08CA4D85A15@gmail.com> Message-ID: For future reference, I suggest using the ?/maker/bin/fasta_merge tool to merge based on the datastore.index rather than other command line based methods. It will handle the multiple fasta types that are produced in the results, and will validate with the datastore.index file. Example: fasta_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log The same is also true when merging gff3 files. gff3_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log Thanks, Carson From: dhivya arasappan Date: Thursday, March 13, 2014 at 12:48 PM To: Carson Holt Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing ah I forgot that some were called superscaffolds. That is a difference between the old and new assembly. This was definitely the issue. Thanks and sorry for the mix up. Dhivya On Mar 13, 2014, at 12:51 PM, Carson Holt wrote: > Note that your command does not capture everything because not all scaffolds > start with the name ?scaffold". > > This works though ?> > ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/*/*trans*fasta|wc -l > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, March 13, 2014 at 11:34 AM > To: Carson Holt > Subject: Re: maker output- transcripts.fasta and proteins.fasta files missing > > Hi Carson, > > Am I looking in the wrong place for my fasta files? I looked here: > > ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|wc > -l > > I see only 97 such files- so 97 contigs with transcripts.fasta files? > > When I count the number of sequences in all these files, I get 514 sequences. > > grep -c '^>' > opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|cut -d ':' > -f 2|awk '{total+=$0}END{print total}' > > Could you tell how and where you are getting the 21,183 transcripts? > > thanks > dhivya > > On Mar 13, 2014, at 12:21 PM, Carson Holt wrote: > >> This is what I see in your uploaded data. There are 21,183 transcripts from >> 201 contigs. Then there are 707 contigs with no gene models. >> >> ?Carson >> >> >> From: Carson Holt >> Date: Thursday, March 13, 2014 at 11:11 AM >> To: dhivya arasappan >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> "as you saw from the output I uploaded before, the output certainly was much >> less than 20,000 transcripts? >> >> Actually there were 21,183 in the output you uploaded. I saw no loss of >> entries. >> >> ?Carson >> >> From: dhivya arasappan >> Date: Thursday, March 13, 2014 at 11:09 AM >> To: Carson Holt >> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >> missing >> >> Hi Carson, >> >> The datastore.index file looks fine- it has a started and finished status for >> my 980 scaffolds. I reran with increased time twice. Second time around, I >> actually deleted the entire output directory to make sure it runs all over >> again. It still seemed to complete within a day. As you saw from the output >> I uploaded before, the output certainly was much less than 20,000 >> transcripts. Given that I was seeing great results for an older version of my >> assembly, I'm puzzled as to why my results are worse this time around. Any >> suggestions of what to check or what I can do to see improved results would >> be really helpful. >> >> I do know that I went from ~4% gaps to ~6% gaps in my new assembly- other >> than that, its better in every way. Could this cause just a dramatic >> difference in results? >> >> Thanks >> dhivya >> >> On Mar 13, 2014, at 11:55 AM, Carson Holt wrote: >> >>> The second time, it should have just started where it left off, so it would >>> run faster (because the processing from the previous job counted towards the >>> second one). The archived output you sent me had 21,183 proteins and >>> transcripts. If you are using the fasta_merge to collect them, just make >>> sure the datastore.index file is not truncated or corrupt otherwise it won?t >>> collect all the fastas from every contig. You can rebuild the >>> datastore.index using the -dsindex flag with MAKER, if you want to check >>> that. Also you can have maker just regenerate results without rerunning >>> BLAST etc., by using the -a flag if you want to just recalculate ll results >>> quickly (rebuilds all FASTA and GFF3 without redoing most analysis). >>> >>> ?Carson >>> >>> >>> From: dhivya arasappan >>> Date: Thursday, March 13, 2014 at 10:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >>> >>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>> missing >>> >>> Thanks Carson for the response. I understand that est2genome=1 does not use >>> any ab initio gene predictions, but simply identifies ests based on >>> alignment. I'm a little confused because I ran maker on my assembly before, >>> using the same parameters ( including est2genome=1). I got a very good >>> result with > 20,000 transcripts and proteins. >>> >>> Then I was able to get an improved assembly, where many scaffolds were >>> combined into superscaffolds. So I reran maker on this assembly. Same >>> parameters, same transcriptome and proteins files. Now, I see such >>> drastically different results: Only 500+ genes and transcripts. My >>> scaffolds are now bigger than before, so I'm not sure how this is happening. >>> These were the results I sent you. >>> >>> Another odd thing I noticed (and I am hesitant to report this because >>> perhaps it is due to some sort of error on my part): I ran maker on the >>> improved assembly the first time and maker did not complete in the 48 hours >>> I allocated. But I had 19,000+ transcripts in the unfinished output. When >>> I reran maker, just changing the time allocated, it completed much faster, >>> but is giving much fewer transcripts and proteins as output. Could >>> something like this happen? If not, then I'm guessing I must have changed >>> something although I'm pretty sure that I did not change anything other than >>> the time allocated. I've attached the trascripts and proteins files from the >>> first time I ran maker on my improved assembly. >>> >>> Thanks again for your help >>> Dhivya >>> >>> >>> >>> On Mar 13, 2014, at 11:14 AM, Carson Holt wrote: >>> >>>> Note protein/transcript fasts are only created when there are gene models >>>> to output to those files (so their absence means there were no gene models >>>> for that contig). Most sequences without protein/transcript fasts in your >>>> sample are very short and thus don?t contain anything. What is left either >>>> have no est2genome results or the est2genome alignments do not have >>>> sufficient open reading frame to be turned into a gene model (false merging >>>> of regions by trinity can cause this, so make sure you use the jaccard >>>> index option when assembling reads with trinity to avoid this). >>>> >>>> You are using only the est2genome=1 option. This will result in a limited >>>> set of genes that can be used for training SNAP/Augustus (so not getting >>>> results on all contigs is expected). You really won?t get much as far as >>>> results until you have one of the ab initio predictors turned on. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> From: dhivya arasappan >>>> Date: Tuesday, March 11, 2014 at 8:52 AM >>>> To: Carson Holt >>>> Cc: Daniel Ence >>>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>>> missing >>>> >>>> Alright done. My username is daras >>>> >>>> Thanks >>>> Dhivya >>>> >>>> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote: >>>> >>>>> Input and compressed file of output. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: dhivya arasappan >>>>> Date: Monday, March 10, 2014 at 2:09 PM >>>>> To: Carson Holt >>>>> Cc: Daniel Ence >>>>> Subject: Re: maker output- transcripts.fasta and proteins.fasta files >>>>> missing >>>>> >>>>> Hi Carson, >>>>> >>>>> Do you mean the whole maker output? >>>>> >>>>> Thanks >>>>> dhivya >>>>> >>>>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote: >>>>> >>>>>> Could you upload everything here ?> >>>>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>>> >>>>>> Than send us the link generated or your user ID. >>>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> >>>>>> From: dhivya arasappan >>>>>> Date: Monday, March 10, 2014 at 1:50 PM >>>>>> To: Carson Holt , Daniel Ence >>>>>> >>>>>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta files >>>>>> missing >>>>>> >>>>>> Hi Carson and Daniel, >>>>>> >>>>>> I'm sending this across to you separately since maker list is blocking my >>>>>> email due to attachment size. >>>>>> >>>>>> As always, thanks for any guidance you can provide. >>>>>> Dhivya >>>>>> >>>>>> >>>>>> Begin forwarded message: >>>>>> >>>>>>> From: dhivya arasappan >>>>>>> Date: March 10, 2014 3:14:03 PM CDT >>>>>>> To: maker-devel at yandell-lab.org >>>>>>> Subject: maker output- transcripts.fasta and proteins.fasta files >>>>>>> missing >>>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I've been running maker with different assembly files, reference files >>>>>>> etc and I check the output by: >>>>>>> >>>>>>> 1. concatenating the gff files >>>>>>> 2. concatenating the *transcripts.fasta files >>>>>>> 3. concatenating the *proteins.fasta files >>>>>>> >>>>>>> I'm noticing that when I ran maker twice with same parameters, the >>>>>>> second time around, many of the output subdirectories do not have a >>>>>>> *transcripts.fasta or *proteins.fasta file in it. >>>>>>> There are 251 subdirectories and only 97 of them have all 3 output >>>>>>> files. Maker log looks ok to me, but I've attached it here as well. >>>>>>> >>>>>>> What could be the reason for this? >>>>>>> >>>>>>> Thanks >>>>>>> dhivya >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Mar 13 15:04:23 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 13 Mar 2014 21:04:23 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas > wrote: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt > wrote: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 14 10:06:26 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 14 Mar 2014 09:06:26 -0700 Subject: [maker-devel] associating gene names between related strains Message-ID: Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Mar 14 11:32:02 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 14 Mar 2014 17:32:02 +0000 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species. Let me know if that's not what you're looking for, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu] Sent: Friday, March 14, 2014 10:06 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] associating gene names between related strains Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 14 12:01:16 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 14 Mar 2014 11:01:16 -0700 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: I will try it today. Thanks for the quick reply! On Fri, Mar 14, 2014 at 10:32 AM, Daniel Ence wrote: > Hi Janna, So do you have one strain that you want to use as the > reference for all the others? There's a script that comes with MAKER called > maker_map_ids that lets you use a common prefix or suffix for entries in a > fasta file from one strain and then use est_forward to use that ID in the > gene models for the other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > Janna Fierst [jfierst at uoregon.edu] > *Sent:* Friday, March 14, 2014 10:06 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate > the gene naming so that orthologs between species can be associated by > name. I have been playing around a little with the est_forward option but > can't figure out a good system/workflow that preserves names but still uses > the strain-specific RNA-Seq EST set for the actual gene models. Thanks! > -Janna > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 12:02:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 12:02:48 -0600 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to know which genes you want to translate names to (two column input file, column 1 -> original ID, column 2 -> new ID). I?m not sure EST forward is the best way to do this, although I do think maker_map_ids is the tool to use in the end. The question is how to make a list of IDs to translate as the input to maker_map_ids? I would actually just use BLASTP against the reference strain, and then do reciprocal best BLAST hits. To do this you BLAST your reference proteins against your maker proteins. Then do the opposite, BLAST your maker proteins against your reference proteins. If they are both each others best hit, then they are orthologous, and you can safely make a two column entry for the maker_map_ids input (i.e. maker-gene-1 translates into smug1). ?Carson From: Daniel Ence Date: Friday, March 14, 2014 at 11:32 AM To: Janna Fierst , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] associating gene names between related strains Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species. Let me know if that's not what you're looking for, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu] Sent: Friday, March 14, 2014 10:06 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] associating gene names between related strains Hi, we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 12:43:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 12:43:41 -0600 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com> References: <3219E92A-2024-45C6-84A9-66C646287D7E@gmail.com> <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com> Message-ID: I?m glad you were able to fix it. I?ll check to see why it was failing as well. Thanks, Carson From: dhivya arasappan Date: Friday, March 14, 2014 at 10:16 AM To: Carson Holt Subject: Re: Error when running maker2zff script Kindly ignore my previous question. I was able to manipulate the scaffold names in the gff file to get maker2zff to work. Thanks dhivya On Mar 14, 2014, at 10:55 AM, dhivya arasappan wrote: > My message got flagged by the maker list again, so I?m forwarding this > separately to you. Is there a better way to send biggish files? > > > Thank you > Dhivya > > > > Begin forwarded message: > >> From: dhivya arasappan >> Subject: Error when running maker2zff script >> Date: March 13, 2014 at 8:35:27 PM CDT >> To: Carson Holt , maker-devel at yandell-lab.org >> >> Hi Carson, >> >> I used gff3_merge to create my gff file from maker output. I've attached it >> here. But when I run maker2zff on it, I get the following error: >> >> Can't use an undefined value as an ARRAY reference at >> /opt/apps/maker/2.30/bin/maker2zff line 177, line 7294251. >> >> It produces an incomplete output file and it looks like it may be running >> into problems when it encounters scaffold3%2F0. I'm wondering if its having >> problems with my scaffold names. There seem to be some inconsistencies >> because it's referred to as scaffold3%F0 and scaffold3/0 in the gff file. >> It goes through other scaffolds like SCAFFOLD3_873, SCAFFOLD3_95 etc just >> fine. I did try replacing the scaffold names in the gff file, but still get >> the same error. Any ideas? >> >> Substitution command I used, for your reference: sed 's/3\%2F/3_/g' gfffile| >> sed 's/\//\_/' > mod.gfffile >> >> Thanks >> Dhivya >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 14 13:25:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 14 Mar 2014 13:25:58 -0600 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: We can look into it. ?Carson From: "Fields, Christopher J" Date: Thursday, March 13, 2014 at 3:04 PM To: Sajeet Haridas Cc: Carson Holt , " List" Subject: Re: [maker-devel] geneid (or alternative ab initio predictors) That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas wrote: > One of the problems I have found with genemark is that it does not understand > a soft-masked genome. Hence, the self training is incorrect. I have found > marked improvement to genemark's prediction by running the training on a hard > masked genome. > > > On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt wrote: >> Adding a new predictor can take some time. It obviously requires some >> coding. It?s usually not too hard just to convert results to GFF3 and >> then pass it in. Integrated support is really only beneficial for >> predictors that can take ?hints? from evidence alignments (for example we >> are working on EVM integration right now - >> http://evidencemodeler.sourceforge.net >> ). If SNAP and GeneMark give >> problems just drop them. GeneMark really doesn?t work very good on >> genomes with complex intron/exon structure (and I really wouldn?t use it >> for anything but fungi). >> >> Make sure you are also giving sufficient protein evidence. Perhaps all >> proteins from chicken and pigeon for example. Then you shouldn?t find >> loss of any true genes if just using Augustus. Also try not to use gene >> count as an indicator of performance. The value is very deceptive, >> especially if the genome assembly is fragmented. >> >> Thanks, >> Carson >> >> >> >> On 3/10/14, 8:52 AM, "Fields, Christopher J" wrote: >> >>> >I have been running MAKER 2.31 using Augustus and SNAP on an avian >>> >genome. Augustus gives pretty decent gene model predictions based on a >>> >custom model we have and the hints MAKER provides. However, SNAP seems >>> >to throw out a ton of false positives; in many cases this appears to >>> >cause erroneous gene fusions. Leaving out SNAP altogether however leads >>> >to a marked decrease in # models overall, which is worse. GeneMark had a >>> >very similar problem (high # false positives) and thus no marked >>> >improvement, either when using with both Augustus and SNAP or with >>> >Augustus alone. >>> > >>> >I have been exploring using geneid >>> >(http://genome.crg.es/software/geneid/) as an alternative, based on some >>> >feedback on another project I worked with int he past. This would be >>> >feed into MAKER using external GFF, but I wanted to see if anyone has >>> >tried geneid with MAKER first. >>> > >>> >Finally, how hard would it be to incorporate alternative callers into >>> >MAKER? For instance, would it be possible to add these like a ?plugin?? >>> > >>> >chris >>> >_______________________________________________ >>> >maker-devel mailing list >>> >maker-devel at box290.bluehost.com >>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri Mar 14 20:22:55 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sat, 15 Mar 2014 02:22:55 +0000 Subject: [maker-devel] geneid (or alternative ab initio predictors) In-Reply-To: References: Message-ID: <53FD788A-15EA-4A18-BB2F-3072178816CA@illinois.edu> Not an issue at the moment; I?ll likely supply these via gff for now. If needed I can work off a svn checkout and send along a patch should I ever manage to eek out time to work on it. chris On Mar 14, 2014, at 2:25 PM, Carson Holt > wrote: We can look into it. ?Carson From: "Fields, Christopher J" > Date: Thursday, March 13, 2014 at 3:04 PM To: Sajeet Haridas > Cc: Carson Holt >, "> List" > Subject: Re: [maker-devel] geneid (or alternative ab initio predictors) That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is). Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file. I assume it could take protein2genome/est2genome as well through the same route. chris On Mar 10, 2014, at 1:31 PM, Sajeet Haridas > wrote: One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome. On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt > wrote: Adding a new predictor can take some time. It obviously requires some coding. It?s usually not too hard just to convert results to GFF3 and then pass it in. Integrated support is really only beneficial for predictors that can take ?hints? from evidence alignments (for example we are working on EVM integration right now - http://evidencemodeler.sourceforge.net). If SNAP and GeneMark give problems just drop them. GeneMark really doesn?t work very good on genomes with complex intron/exon structure (and I really wouldn?t use it for anything but fungi). Make sure you are also giving sufficient protein evidence. Perhaps all proteins from chicken and pigeon for example. Then you shouldn?t find loss of any true genes if just using Augustus. Also try not to use gene count as an indicator of performance. The value is very deceptive, especially if the genome assembly is fragmented. Thanks, Carson On 3/10/14, 8:52 AM, "Fields, Christopher J" > wrote: >I have been running MAKER 2.31 using Augustus and SNAP on an avian >genome. Augustus gives pretty decent gene model predictions based on a >custom model we have and the hints MAKER provides. However, SNAP seems >to throw out a ton of false positives; in many cases this appears to >cause erroneous gene fusions. Leaving out SNAP altogether however leads >to a marked decrease in # models overall, which is worse. GeneMark had a >very similar problem (high # false positives) and thus no marked >improvement, either when using with both Augustus and SNAP or with >Augustus alone. > >I have been exploring using geneid >(http://genome.crg.es/software/geneid/) as an alternative, based on some >feedback on another project I worked with int he past. This would be >feed into MAKER using external GFF, but I wanted to see if anyone has >tried geneid with MAKER first. > >Finally, how hard would it be to incorporate alternative callers into >MAKER? For instance, would it be possible to add these like a ?plugin?? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Mon Mar 17 13:45:15 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 17 Mar 2014 19:45:15 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: I have attached 4 files for you to place in the .../maker/Widgets/ directory. The *blast.pm files will suppress the BLAST+ failures you are getting (alternatively you can just downgrade to BLAST 2.27 to get the same effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. In the latest release NCBI redid all their warnings and error codes so it spits out a lot of garbage and fails with different messages than it did before. For example BLAST now warns you every time it encounter a fasta header with a comment (virtually every fasta entry in existence falls in this category), so your screen will be awash with meaningless warning messages. The fgenesh.pm file will fix the other failure, which only occurs if you use fgenesh simultaneously with the est_fustion=1 option. No other predictors are affected. Thanks, Carson On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: >Dear Carson > >Sorry for the late reply. I was away for a couple of days. I have uploaded >the out put files plus control and error output on the FTP site that you >provided >The user ID is borhanh > >I used blast+ for this run. > > > > >Regards > > >HB > > > > > > > > >On 14-03-13 10:00 AM, "Carson Holt" wrote: > >>Just resending this to the correct maker-devel address. Please when >>replying, do not CC the incorrect maker-devel-bounce address. >> >>Thanks, >>Carson >> >> >>On 3/13/14, 9:56 AM, "Carson Holt" wrote: >> >>>FGENESH is not a heavily used tool, so depending on which version it is >>>(either too old or too new), output might be slightly different which >>>could cause incorrect parsing. Could you tar up your maker.output >>>folder, >>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>(send me either your user/guest ID after you upload). >>> >>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>is >>>the old legacy version of NCBI BLAST. You can do this by setting the >>>blast type in maker_bopts.ctl and the location of executables in >>>maker_exe.ctl. >>> >>>Thanks, >>>Carson >>> >>> >>> >>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Dear Maker users >>>> >>>> >>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>the >>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>inserted in the nucleotide sequence of some of the genes. This seems to >>>>be related to transcripts predicted by fgenesh_masked. >>>> >>>> >>>>Here is an example for one of the genes >>>> >>>> >>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>offset:0 AE >>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>TTTCGACAAGC >>>> >>>>The same genome sequence was used for the first round of maker (2.10) >>>>without such problem. I checked the sequence for the scaffold related >>>>to >>>>one of the affected transcripts and there was no error in the sequence. >>>>I am not sure what is causing this. The only error that I could spot in >>>>the output error file is the following >>>> >>>> >>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>contexts/frames of query sequences. >>>> >>>> >>>> >>>>Your help is appreciated >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: blastn.pm Type: text/x-perl-script Size: 8112 bytes Desc: blastn.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastx.pm Type: text/x-perl-script Size: 8218 bytes Desc: blastx.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fgenesh.pm Type: text/x-perl-script Size: 19744 bytes Desc: fgenesh.pm URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tblastx.pm Type: text/x-perl-script Size: 9113 bytes Desc: tblastx.pm URL: From carsonhh at gmail.com Mon Mar 17 15:14:42 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Mar 2014 15:14:42 -0600 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: References: Message-ID: Just an update on this. I?ve fixed the maker2zff script to handle the issues seen. Looking at this actually brought to light another issue. There is inconsistent escape character specification for GFF3 in column 1 (the source ID), column 8 (the attributes ID and Target_ID), as well as the FASTA ID for internal sequence. We?re updating the GFF3 spec to clarify this so that everywhere you see the same ID getting treated the same way for character escaping. To be safe though, only use these characters in your contig IDs for the assembly when using any tool that reads or outputs GFF3 ?> a-zA-Z0-9.:^*$@!+_?-| Any character not in that set has a high chance of breaking some downstream tool. For now just assume the strict interpretation from the GFF3 spec for column 1, must be used on all IDs everywhere (see below). >>Column 1: ?seqid" >>The ID of the landmark used to establish the coordinate system for the >>current feature. >>IDs may contain any characters, but must escape any characters not in >>the set [a-zA-Z0-9.:^*$@!+_?-|]. >>In particular, IDs may not contain unescaped whitespace and must not >>begin with an unescaped ">". Thanks, Carson On 3/13/14, 7:35 PM, "dhivya arasappan" wrote: >Hi Carson, > >I used gff3_merge to create my gff file from maker output. I've >attached it here. But when I run maker2zff on it, I get the following >error: > >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/ >2.30/bin/maker2zff line 177, line 7294251. > >It produces an incomplete output file and it looks like it may be >running into problems when it encounters scaffold3%2F0. I'm wondering >if its having problems with my scaffold names. There seem to be some >inconsistencies because it's referred to as scaffold3%F0 and >scaffold3/0 in the gff file. It goes through other scaffolds like >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine. I did try replacing the >scaffold names in the gff file, but still get the same error. Any >ideas? > >Substitution command I used, for your reference: sed 's/3\%2F/3_/g' >gfffile| sed 's/\//\_/' > mod.gfffile > >Thanks >Dhivya > From darasappan at gmail.com Mon Mar 17 15:20:18 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 17 Mar 2014 16:20:18 -0500 Subject: [maker-devel] Error when running maker2zff script In-Reply-To: References: Message-ID: Awesome! Thanks Carson. Dhivya On Mon, Mar 17, 2014 at 4:14 PM, Carson Holt wrote: > Just an update on this. I've fixed the maker2zff script to handle the > issues seen. Looking at this actually brought to light another issue. > There is inconsistent escape character specification for GFF3 in column 1 > (the source ID), column 8 (the attributes ID and Target_ID), as well as > the FASTA ID for internal sequence. We're updating the GFF3 spec to > clarify this so that everywhere you see the same ID getting treated the > same way for character escaping. > > To be safe though, only use these characters in your contig IDs for the > assembly when using any tool that reads or outputs GFF3 --> > a-zA-Z0-9.:^*$@!+_?-| > > Any character not in that set has a high chance of breaking some > downstream tool. For now just assume the strict interpretation from the > GFF3 spec for column 1, must be used on all IDs everywhere (see below). > > >>Column 1: "seqid" > >>The ID of the landmark used to establish the coordinate system for the > >>current feature. > >>IDs may contain any characters, but must escape any characters not in > >>the set [a-zA-Z0-9.:^*$@!+_?-|]. > >>In particular, IDs may not contain unescaped whitespace and must not > >>begin with an unescaped ">". > > > Thanks, > Carson > > > > On 3/13/14, 7:35 PM, "dhivya arasappan" wrote: > > >Hi Carson, > > > >I used gff3_merge to create my gff file from maker output. I've > >attached it here. But when I run maker2zff on it, I get the following > >error: > > > >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/ > >2.30/bin/maker2zff line 177, line 7294251. > > > >It produces an incomplete output file and it looks like it may be > >running into problems when it encounters scaffold3%2F0. I'm wondering > >if its having problems with my scaffold names. There seem to be some > >inconsistencies because it's referred to as scaffold3%F0 and > >scaffold3/0 in the gff file. It goes through other scaffolds like > >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine. I did try replacing the > >scaffold names in the gff file, but still get the same error. Any > >ideas? > > > >Substitution command I used, for your reference: sed 's/3\%2F/3_/g' > >gfffile| sed 's/\//\_/' > mod.gfffile > > > >Thanks > >Dhivya > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at bils.se Tue Mar 18 05:43:43 2014 From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Tue, 18 Mar 2014 12:43:43 +0100 Subject: [maker-devel] Maker changes 2.30-2.31 Message-ID: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> Hi, I have observed a few oddities with our installation of maker 2.31 and was therefore wondering if there is a change log somewhere to get some information on what, if anything, was changed between 2.30 and 2.31? There is of course a good chance that the issues I am seeing (pipeline locking up) are related to our setup and not necessarily Maker - but I?d like to make sure, if possible. Both versions use the exact same external binaries etc, and were run on the same data. 2.30 is running along happily, 2.31 however has randomly locked up. I should perhaps also say that I am running on SL 6.2 and am using mpich2 for the MPI run. I haven?t done any more systematic testing so far, but will probably do so if there is no ?obvious? reason why Maker 2.31 should behave differently.. Cheers, Marc Marc P. Hoeppner, PhD Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at bils.se From carsonhh at gmail.com Tue Mar 18 09:07:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 09:07:07 -0600 Subject: [maker-devel] Maker changes 2.30-2.31 In-Reply-To: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> References: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se> Message-ID: Attached. Also make sure you are using the tar ball from the lab website and not the prerelease from the subversion repository. Thanks, Carson On 3/18/14, 5:43 AM, "Marc H?ppner" wrote: >Hi, > >I have observed a few oddities with our installation of maker 2.31 and >was therefore wondering if there is a change log somewhere to get some >information on what, if anything, was changed between 2.30 and 2.31? > >There is of course a good chance that the issues I am seeing (pipeline >locking up) are related to our setup and not necessarily Maker - but I?d >like to make sure, if possible. Both versions use the exact same external >binaries etc, and were run on the same data. 2.30 is running along >happily, 2.31 however has randomly locked up. I should perhaps also say >that I am running on SL 6.2 and am using mpich2 for the MPI run. > >I haven?t done any more systematic testing so far, but will probably do >so if there is no ?obvious? reason why Maker 2.31 should behave >differently.. > >Cheers, > >Marc > > > > >Marc P. Hoeppner, PhD >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at bils.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- r1060 | cholt | 2013-11-04 11:18:12 -0700 (Mon, 04 Nov 2013) | MAKER stable release version 2.30 r1061 | cholt | 2013-11-10 22:19:51 -0700 (Sun, 10 Nov 2013) | altered build install slightly r1062 | cholt | 2013-11-25 09:33:16 -0700 (Mon, 25 Nov 2013) | updated fgenesh for hint based annotation error r1063 | cholt | 2013-12-05 14:10:42 -0700 (Thu, 05 Dec 2013) | fix repeat too short output error r1064 | cholt | 2013-12-05 14:18:04 -0700 (Thu, 05 Dec 2013) | updated installation scripts r1065 | cholt | 2013-12-13 08:42:08 -0700 (Fri, 13 Dec 2013) | fix fully masked failure for BLAST 2.2.25 r1066 | cholt | 2014-01-09 10:45:08 -0700 (Thu, 09 Jan 2014) | update MWAS and maker2jbrowse r1067 | cholt | 2014-01-09 11:34:18 -0700 (Thu, 09 Jan 2014) | fix invalid character in Ecoli example fasta r1068 | cholt | 2014-01-24 10:42:15 -0700 (Fri, 24 Jan 2014) | added iprscan to maker.css for MWAS r1070 | cholt | 2014-01-26 20:27:52 -0700 (Sun, 26 Jan 2014) | attempt to fix ipr_update issues with Name ne to ID and fix lock with GFF3DB as well as docs for JBrowse and MAKER install r1071 | cholt | 2014-01-26 20:41:55 -0700 (Sun, 26 Jan 2014) | alter install to hide MWAS fix skip of small contigs and map forward of genes with est_forward r1072 | cholt | 2014-01-28 11:20:41 -0700 (Tue, 28 Jan 2014) | added message to get user to use the correct maker executable and updated INSTALL docs r1073 | cholt | 2014-01-28 11:36:19 -0700 (Tue, 28 Jan 2014) | further update to maker from wrong directory message when name has whitespace r1074 | cholt | 2014-02-03 14:48:05 -0700 (Mon, 03 Feb 2014) | fixed segfault on exit for OpenMPI r1075 | cholt | 2014-02-03 15:32:38 -0700 (Mon, 03 Feb 2014) | added support for optional test compiler flags to be used with MVAPICH2 r1076 | cholt | 2014-02-03 15:38:52 -0700 (Mon, 03 Feb 2014) | fixed build commit missing m option r1077 | cholt | 2014-02-04 14:29:43 -0700 (Tue, 04 Feb 2014) | made MPI communication always serialize r1078 | cholt | 2014-02-05 11:23:10 -0700 (Wed, 05 Feb 2014) | updated MPI calling to use probe for size rather than another message for faster performance r1079 | cholt | 2014-02-06 08:29:45 -0700 (Thu, 06 Feb 2014) | fixed labeling bug, fixed hanging MPI calls, fixed trnascan introns, and length r1080 | cholt | 2014-02-11 10:08:33 -0700 (Tue, 11 Feb 2014) | switch FindBin::Bin for FindBin::RealBin throughout r1081 | cholt | 2014-02-11 10:49:24 -0700 (Tue, 11 Feb 2014) | MAKER stable release version 2.31 From fbarreto at ucsd.edu Tue Mar 18 10:08:47 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Tue, 18 Mar 2014 09:08:47 -0700 Subject: [maker-devel] Size of initial EST training set for SNAP Message-ID: Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 10:14:29 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 10:14:29 -0600 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: That sounds good. 1,500 initial models should be more than sufficient for the first round of training. ?Carson From: Felipe Barreto Date: Tuesday, March 18, 2014 at 10:08 AM To: MAKER group Subject: [maker-devel] Size of initial EST training set for SNAP Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Mar 18 10:16:20 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 18 Mar 2014 16:16:20 +0000 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: Hi Felipe, I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training. The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence. Good luck and let us know if there's anything we can help with! Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu] Sent: Tuesday, March 18, 2014 10:08 AM To: MAKER group Subject: [maker-devel] Size of initial EST training set for SNAP Hi, all, I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. >From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. Thanks for any insight! -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue Mar 18 10:26:45 2014 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 18 Mar 2014 10:26:45 -0600 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: References: Message-ID: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> Hi Felipe, I think that plan sounds quite reasonable. To address your primary concern, most gene prediction tools recommend something in the range of a minimum of a few hundred gene models to train on. Since your an order of magnitude above that I think your in good shape. Having said that, of course if you have concerns about biases in your training set you may be able to supplement it further by using a tool like CEGMA (http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence genes that your set is missing. Since the final gene set will only be as complete as the gene predictions that MAKER has to choose from I would suggest that you also consider including at least one other gene predictor. Augustus works well on a wide variety of genomes and while it is more difficult to train than SNAP it does accept hints from MAKER and will likely add to the diversity of the final gene set, even if you choose to use an existing HMM that has some reasonable relationship to your genome. This is one of the advantages of MAKER supervision, while it would be best to train Augustus as well, MAKER will ensure that the final models are not too far out of line with the evidence and you'll likely see quite good results using a custom SNAP HMM and an existing Augustus HMM as predictor within MAKER. Thanks, B On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote: > Hi, all, > > I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb). I started by training SNAP, but would like to check my approach before continuing with longer runs. > > From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc). I then used only these 2000 ESTs in a first Maker run using est2genome=1. The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point). > > I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file. The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course). > > Does this sound like a reasonable approach? Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step. > > Thanks for any insight! > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Tue Mar 18 10:59:39 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Tue, 18 Mar 2014 09:59:39 -0700 Subject: [maker-devel] Size of initial EST training set for SNAP In-Reply-To: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> References: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu> Message-ID: Thanks, guys, for the swift and informative response! I will try to train Augustus again, but at the very least, will include it with an arthropod HMM in my final run (in addition to my custom SNAP HMM). Cheers, Felipe On Tue, Mar 18, 2014 at 9:26 AM, Barry Moore wrote: > Hi Felipe, > > I think that plan sounds quite reasonable. To address your primary > concern, most gene prediction tools recommend something in the range of a > minimum of a few hundred gene models to train on. Since your an order of > magnitude above that I think your in good shape. Having said that, of > course if you have concerns about biases in your training set you may be > able to supplement it further by using a tool like CEGMA ( > http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence > genes that your set is missing. > > Since the final gene set will only be as complete as the gene predictions > that MAKER has to choose from I would suggest that you also consider > including at least one other gene predictor. Augustus works well on a wide > variety of genomes and while it is more difficult to train than SNAP it > does accept hints from MAKER and will likely add to the diversity of the > final gene set, even if you choose to use an existing HMM that has some > reasonable relationship to your genome. This is one of the advantages of > MAKER supervision, while it would be best to train Augustus as well, MAKER > will ensure that the final models are not too far out of line with the > evidence and you'll likely see quite good results using a custom SNAP HMM > and an existing Augustus HMM as predictor within MAKER. > > Thanks, > > B > > On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote: > > Hi, all, > > I've been learning a lot from reading posts from this group, and finally > started doing actual runs of Maker on our current genome assembly > (arthropod, genome size ~230Mb). I started by training SNAP, but would > like to check my approach before continuing with longer runs. > > From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I > deemed of very high quality based on blast alignments to Swiss-Prot (based > on query-subject coverage, bit score, etc). I then used only these 2000 > ESTs in a first Maker run using est2genome=1. The output returned 1500 > models (with the 500 "missing" models probably a result of single-exon > issues; not a concern at this point). > > I now plan on training SNAP with this first output, and then doing another > Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein > evidence, and 3) SNAP with the first HMM file. The output of this second > run will be used to re-train SNAP, and this second HMM file will be used in > a final "official" run (while continuing to provide the EST and protein > evidence, of course). > > Does this sound like a reasonable approach? Simply put, my main concern > is whether I'm using too few ESTs in my first est2genome step. > > Thanks for any insight! > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 18 13:27:11 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 18 Mar 2014 14:27:11 -0500 Subject: [maker-devel] maker snap output files Message-ID: Hello, I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. *maker.proteins.fasta *maker.snap_masked.proteins.fasta *maker.non_overlapping_ab_initio.proteins.fasta What is the difference among these? They all have different number of sequences. Similarly,with transcripts: maker.non_overlapping_ab_initio.transcripts.fasta maker.snap_masked.transcripts.fasta maker.transcripts.fasta Thanks Dhivya -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 13:34:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 13:34:05 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: References: Message-ID: maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 18, 2014 at 1:27 PM To: Carson Holt , Subject: maker snap output files Hello, I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. *maker.proteins.fasta *maker.snap_masked.proteins.fasta *maker.non_overlapping_ab_initio.proteins.fasta What is the difference among these? They all have different number of sequences. Similarly,with transcripts: maker.non_overlapping_ab_initio.transcripts.fasta maker.snap_masked.transcripts.fasta maker.transcripts.fasta Thanks Dhivya -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 18 14:05:39 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 18 Mar 2014 15:05:39 -0500 Subject: [maker-devel] maker snap output files In-Reply-To: References: Message-ID: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Thanks Carson. Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? The unfiltered proteins file has more proteins though. Thanks Dhivya On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) > maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) > maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. > > > These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. > > Thanks, > Carson > > > > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 1:27 PM > To: Carson Holt , > Subject: maker snap output files > > Hello, > > I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. > > *maker.proteins.fasta > *maker.snap_masked.proteins.fasta > *maker.non_overlapping_ab_initio.proteins.fasta > > What is the difference among these? They all have different number of sequences. > > Similarly,with transcripts: > > maker.non_overlapping_ab_initio.transcripts.fasta > maker.snap_masked.transcripts.fasta > maker.transcripts.fasta > > Thanks > Dhivya > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 18 14:09:01 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 18 Mar 2014 14:09:01 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Message-ID: There can also be hint based predictions. They may be similar in size, but there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold). You should always review your annotations in something like Apollo, to see how the models compare to the evidence. Just counts don?t really mean anything. Thanks, Carson From: dhivya arasappan Date: Tuesday, March 18, 2014 at 2:05 PM To: Carson Holt Cc: Subject: Re: maker snap output files Thanks Carson. Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? The unfiltered proteins file has more proteins though. Thanks Dhivya On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > maker.proteins.fasta - these are the final filtered and modified protein > models (this is what you want) > maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio > predictions (for reference purposes) > maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant > rejected models that do not overlap the maker.proteins.fasta entries. If you > think you are missing a gene, look for it here. Sometimes people use > interproscan (very slow) to analyze this file for false negatives. > > > These files are also described in the README distributed with MAKER in the > ?MAKER OUTPUT? section. > > Thanks, > Carson > > > > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 1:27 PM > To: Carson Holt , > Subject: maker snap output files > > Hello, > > I ran maker after running SNAP ab initio prediction (following instructions > from the maker tutorial). It ran successfully and when I ran fasta_merge, I > got several output fasta files. I?m unable to find information on the tutorial > about interpreting these different files. I?m hoping one of you can help. > > *maker.proteins.fasta > *maker.snap_masked.proteins.fasta > *maker.non_overlapping_ab_initio.proteins.fasta > > What is the difference among these? They all have different number of > sequences. > > Similarly,with transcripts: > > maker.non_overlapping_ab_initio.transcripts.fasta > maker.snap_masked.transcripts.fasta > maker.transcripts.fasta > > Thanks > Dhivya > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisbioinfo at gmail.com Wed Mar 19 05:09:57 2014 From: chrisbioinfo at gmail.com (Chris Bioinfo) Date: Wed, 19 Mar 2014 12:09:57 +0100 Subject: [maker-devel] Annotation with maker2 Message-ID: Hello, I'm installing/using maker2 for the first time and I have an error by using it. I certainly missing something, but I don't know what. I compile maker with no error message and I have all these directories after compilation: bin data GMOD INSTALL lib LICENSE MWAS perl README src Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have this error: STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm Can't call method "do" on an undefined value at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm --> rank=NA, hostname=belem ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 ... ideas? Best, Christelle -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 19 07:01:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Mar 2014 07:01:35 -0600 Subject: [maker-devel] Annotation with maker2 In-Reply-To: References: Message-ID: Your problem is one of the following. You need to reinstall the DBD::SQLite module, you are running in a directory you don?t have permissions for, you set your TMDIR environmental variable or TMP value in maker_opts.ctl to an NFS mounted or memory mounted directory, or you are using a self compiled version of Perl (I.e. not /usr/bin/perl) that has issues (probably with DB or SQLite modules). You can also completely delete the output directory, and start again to see if it was just a random error. You should look at each of those first. You can also run MAKER with the --debug command line flag and send it to me if all of those seem not to be the issue. Thanks, Carson From: Chris Bioinfo Date: Wednesday, March 19, 2014 at 5:09 AM To: Subject: [maker-devel] Annotation with maker2 Hello, I'm installing/using maker2 for the first time and I have an error by using it. I certainly missing something, but I don't know what. I compile maker with no error message and I have all these directories after compilation: bin data GMOD INSTALL lib LICENSE MWAS perl README src Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have this error: STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm Can't call method "do" on an undefined value at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm --> rank=NA, hostname=belem ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 ... ideas? Best, Christelle _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbharris at uw.edu Wed Mar 19 19:19:27 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Wed, 19 Mar 2014 18:19:27 -0700 Subject: [maker-devel] tradeoff between run time & file number Message-ID: Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Mar 19 23:43:11 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 20 Mar 2014 05:43:11 +0000 Subject: [maker-devel] tradeoff between run time & file number In-Reply-To: References: Message-ID: Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end. Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp. Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu] Sent: Wednesday, March 19, 2014 7:19 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] tradeoff between run time & file number Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Mar 20 11:22:47 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 20 Mar 2014 12:22:47 -0500 Subject: [maker-devel] maker snap output files In-Reply-To: References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> Message-ID: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> Hi Carson, Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file? The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene? so does this mean these are the genes from the snap prediction tool? Thanks dhivya On Mar 18, 2014, at 3:09 PM, Carson Holt wrote: > There can also be hint based predictions. They may be similar in size, but there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold). You should always review your annotations in something like Apollo, to see how the models compare to the evidence. Just counts don?t really mean anything. > > Thanks, > Carson > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 2:05 PM > To: Carson Holt > Cc: > Subject: Re: maker snap output files > > Thanks Carson. > > Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results? I assumed that annotations through alignment+annotation through prediction would equal more annotations? > > The unfiltered proteins file has more proteins though. > > Thanks > Dhivya > > > > On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > >> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want) >> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes) >> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here. Sometimes people use interproscan (very slow) to analyze this file for false negatives. >> >> >> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section. >> >> Thanks, >> Carson >> >> >> >> >> From: dhivya arasappan >> Date: Tuesday, March 18, 2014 at 1:27 PM >> To: Carson Holt , >> Subject: maker snap output files >> >> Hello, >> >> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial). It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help. >> >> *maker.proteins.fasta >> *maker.snap_masked.proteins.fasta >> *maker.non_overlapping_ab_initio.proteins.fasta >> >> What is the difference among these? They all have different number of sequences. >> >> Similarly,with transcripts: >> >> maker.non_overlapping_ab_initio.transcripts.fasta >> maker.snap_masked.transcripts.fasta >> maker.transcripts.fasta >> >> Thanks >> Dhivya >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 20 11:24:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 20 Mar 2014 11:24:41 -0600 Subject: [maker-devel] maker snap output files In-Reply-To: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> References: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com> <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com> Message-ID: maker transcripts will be the gene/mRNA/exon/CDS features All other transcripts from SNAP etc. will be match/match_part features in the GFF3. When you look at these in something like Apollo, they will be placed in different viewing panels based on their type. Thanks, Carson From: dhivya arasappan Date: Thursday, March 20, 2014 at 11:22 AM To: Carson Holt Cc: Subject: Re: maker snap output files Hi Carson, Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file? The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene? so does this mean these are the genes from the snap prediction tool? Thanks dhivya On Mar 18, 2014, at 3:09 PM, Carson Holt wrote: > There can also be hint based predictions. They may be similar in size, but > there is no rule. Generally maker.snap_masked.proteins.fasta will be larger, > as gene predictors tend to over predict (as much as 10 fold). You should > always review your annotations in something like Apollo, to see how the models > compare to the evidence. Just counts don?t really mean anything. > > Thanks, > Carson > > From: dhivya arasappan > Date: Tuesday, March 18, 2014 at 2:05 PM > To: Carson Holt > Cc: > Subject: Re: maker snap output files > > Thanks Carson. > > Is it normal that in my maker results after running snap, the number of > proteins (in *maker.proteins.fasta) Is actually less than the number of > proteins in my pre-snap maker results? I assumed that annotations through > alignment+annotation through prediction would equal more annotations? > > The unfiltered proteins file has more proteins though. > > Thanks > Dhivya > > > > On Mar 18, 2014, at 2:34 PM, Carson Holt wrote: > >> maker.proteins.fasta - these are the final filtered and modified protein >> models (this is what you want) >> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab >> initio predictions (for reference purposes) >> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant >> rejected models that do not overlap the maker.proteins.fasta entries. If you >> think you are missing a gene, look for it here. Sometimes people use >> interproscan (very slow) to analyze this file for false negatives. >> >> >> These files are also described in the README distributed with MAKER in the >> ?MAKER OUTPUT? section. >> >> Thanks, >> Carson >> >> >> >> >> From: dhivya arasappan >> Date: Tuesday, March 18, 2014 at 1:27 PM >> To: Carson Holt , >> Subject: maker snap output files >> >> Hello, >> >> I ran maker after running SNAP ab initio prediction (following instructions >> from the maker tutorial). It ran successfully and when I ran fasta_merge, I >> got several output fasta files. I?m unable to find information on the >> tutorial about interpreting these different files. I?m hoping one of you can >> help. >> >> *maker.proteins.fasta >> *maker.snap_masked.proteins.fasta >> *maker.non_overlapping_ab_initio.proteins.fasta >> >> What is the difference among these? They all have different number of >> sequences. >> >> Similarly,with transcripts: >> >> maker.non_overlapping_ab_initio.transcripts.fasta >> maker.snap_masked.transcripts.fasta >> maker.transcripts.fasta >> >> Thanks >> Dhivya >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 20 11:53:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 20 Mar 2014 11:53:24 -0600 Subject: [maker-devel] tradeoff between run time & file number In-Reply-To: References: Message-ID: You may also want to try the GFF3 pass_through options. Basically you give your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from your past run by setting the 'pass' options to 1. Then you can run without your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat masker species as well). The values will be passed forward from the GFF3 file into the current run. --Carson From: Daniel Ence Date: Wednesday, March 19, 2014 at 11:43 PM To: Rebecca Harris , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] tradeoff between run time & file number Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end. Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp. Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu] Sent: Wednesday, March 19, 2014 7:19 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] tradeoff between run time & file number Hi - I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output? Thanks, Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 21 08:23:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 08:23:18 -0600 Subject: [maker-devel] Annotation with maker2 In-Reply-To: References: Message-ID: Glad it's working. Let us know if anything else comes up. --Carson From: Chris Bioinfo Date: Friday, March 21, 2014 at 4:57 AM To: Carson Holt Subject: Re: [maker-devel] Annotation with maker2 Dear Carson it works!! after many difficults : I have installed sqlite3.8.4.1 yesterday: it was """better"""" (no error message by launching sqlite3). Yet my test.db was not created.. Today I find the trick! the problem was due to my too long path to created the db .. only that... Thanks for your time and you help Carson! All the best, Christelle 2014-03-20 18:21 GMT+01:00 Carson Holt : > Also you can use this command line to test both before and after installing > > perl -MDBI -MDBD::SQLite -e 'print "$DBD::SQLite::sqlite_version\n"; $dbh = > DBI->connect("dbi:SQLite:dbname=/path/from/maker/error/dpp_contig.db","","");' > > Make sure to set /path/from/maker/error/dpp_contig.db to whatever its was in > the error. > > --Carson > > > From: Carson Holt > Date: Thursday, March 20, 2014 at 11:03 AM > To: Chris Bioinfo > > Subject: Re: [maker-devel] Annotation with maker2 > > The failure is in SQLite. So you have to reinstall. I.e. 'force install > DBD::SQLite' in CPAN. Otherwise you are just keeping whatever module is > installed which may have broken C bindings. > > You may also have to install SQLite 3.8.4.1, and then reinstall the perl > modules using the force option to force recompile. > > --Carson > > > > From: Chris Bioinfo > Date: Thursday, March 20, 2014 at 10:57 AM > To: Carson Holt > Subject: Re: [maker-devel] Annotation with maker2 > > cpan[2]> install DBI > DBI is up to date (1.631). > > cpan[3]> install DBD::SQLite > DBD::SQLite is up to date (1.42). > > my test.db is not created effectively: > > sqlite3 dpp_contig.maker.output/test.db > SQLite version 3.8.3.1 2014-02-11 14:52:19 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> > > > > > 2014-03-20 17:36 GMT+01:00 Carson Holt : >> I'm actually checking the mount points for the disk. SQLite won't work on >> filesystems that don't implement locks, and 'df' is a good way to infer some >> of that info. >> >> Basically I still think this is SQLlite failing on your system. You might >> need to reinstall SQLlite and then reinstall the perl DBI and DBD::SQLite >> modules. >> >> You can also do a test command --> 'sqllite3 dpp_contig.maker.output/test.db' >> >> This will work if you have sqllite3 installed. And any error it give may be >> informative. >> >> --Carson >> >> From: Chris Bioinfo >> Date: Thursday, March 20, 2014 at 10:29 AM >> >> To: Carson Holt >> Subject: Re: [maker-devel] Annotation with maker2 >> >> oh sorry >> >> my disks are quite full, but still space I guess for maker >> >> /dev/sdc1 19T 18T 934G 95% /home >> >> >> 2014-03-20 17:23 GMT+01:00 Chris Bioinfo : >>> this : >>> >>> du -h dpp_contig.maker.output/ >>> 0 >>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi >>> d.contig-dpp-500-500/0 >>> 88K >>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi >>> d.contig-dpp-500-500 >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500 >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05/1F >>> 92K dpp_contig.maker.output/dpp_contig_datastore/05 >>> 92K dpp_contig.maker.output/dpp_contig_datastore >>> 4.0K dpp_contig.maker.output/dpp_contig_master_datastore_index.log >>> 4.0K dpp_contig.maker.output/maker_bopts.log >>> 4.0K dpp_contig.maker.output/maker_exe.log >>> 8.0K dpp_contig.maker.output/maker_opts.log >>> 16K dpp_contig.maker.output/mpi_blastdb/dpp_protein%2Efasta.mpi.1 >>> 44K dpp_contig.maker.output/mpi_blastdb/dpp_contig%2Efasta.mpi.1 >>> 14M dpp_contig.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10 >>> 32K dpp_contig.maker.output/mpi_blastdb/dpp_est%2Efasta.mpi.1 >>> 14M dpp_contig.maker.output/mpi_blastdb >>> 0 dpp_contig.maker.output/seen.dbm >>> >>> >>> >>> 2014-03-20 17:10 GMT+01:00 Carson Holt : >>> >>>> What does 'df -h dpp_contig.maker.output' show? >>>> >>>> --Carson >>>> >>>> From: Chris Bioinfo >>>> Date: Thursday, March 20, 2014 at 10:00 AM >>>> >>>> To: Carson Holt >>>> Subject: Re: [maker-devel] Annotation with maker2 >>>> >>>> sorry, mistake on the dir! >>>> >>>> I have these files: >>>> dpp_contig_datastore dpp_contig_master_datastore_index.log >>>> maker_bopts.log maker_exe.log maker_opts.log mpi_blastdb seen.dbm >>>> >>>> >>>> 2014-03-20 16:59 GMT+01:00 Chris Bioinfo : >>>>> no, >>>>> >>>>> I have theses files in the directory: >>>>> dpp_contig.fasta dpp_est.fasta hsap_contig.fasta >>>>> hsap_protein.fasta maker_exe.ctl >>>>> dpp_contig.maker.output dpp_protein.fasta hsap_est.fasta >>>>> maker_bopts.ctl maker_opts.ctl te_proteins.fasta >>>>> >>>>> >>>>> >>>>> 2014-03-20 16:53 GMT+01:00 Carson Holt : >>>>> >>>>>> Did >>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker. >>>>>> output/dpp_contig.db exist? >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> From: Chris Bioinfo >>>>>> Date: Thursday, March 20, 2014 at 9:50 AM >>>>>> >>>>>> To: Carson Holt >>>>>> Subject: Re: [maker-devel] Annotation with maker2 >>>>>> >>>>>> cdantec at belem:~$ /usr/bin/perl -v >>>>>> >>>>>> This is perl 5, version 18, subversion 1 (v5.18.1) built for >>>>>> x86_64-linux-gnu-thread-multi >>>>>> (with 46 registered patches, see perl -V for more detail) >>>>>> >>>>>> Copyright 1987-2013, Larry Wall >>>>>> >>>>>> Perl may be copied only under the terms of either the Artistic License or >>>>>> the >>>>>> GNU General Public License, which may be found in the Perl 5 source kit. >>>>>> >>>>>> Complete documentation for Perl, including FAQ lists, should be found on >>>>>> this system using "man perl" or "perldoc perl". If you have access to >>>>>> the >>>>>> Internet, point your browser at http://www.perl.org/, the Perl Home Page. >>>>>> >>>>>> >>>>>> >>>>>> 2014-03-20 16:32 GMT+01:00 Carson Holt : >>>>>>> What do you get for when you type --> /usr/bin/perl -v >>>>>>> >>>>>>> The key to the error is this line --> >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file >>>>>>> >>>>>>> Either the database doesn't exist, or is corrupt. Does it exist? >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> From: Chris Bioinfo >>>>>>> Date: Thursday, March 20, 2014 at 9:25 AM >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Annotation with maker2 >>>>>>> >>>>>>> Dear Carson, >>>>>>> >>>>>>> I have reinstalled DBD::SQLite module, check the permission in my >>>>>>> directory, configure the TMP value in maker_opts.ctl. perl is in >>>>>>> /usr/bin/perl. >>>>>>> I have deleted many times the output directory.. but same problem.. >>>>>>> >>>>>>> So here the debug output : >>>>>>> ****MODULE VERSION INFO >>>>>>> 0.05 Acme::Damn /usr/local/lib/perl/5.18.1/Acme/Damn.pm >>>>>>> 1.01 AnyDBM_File /usr/share/perl/5.18/AnyDBM_File.pm >>>>>>> 5.73 AutoLoader /usr/share/perl/5.18/AutoLoader.pm >>>>>>> UNKNOWN Bio::AnalysisParserI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnalysisParserI.pm >>>>>>> UNKNOWN Bio::AnnotatableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotatableI.pm >>>>>>> UNKNOWN Bio::Annotation::Collection >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/Collection.pm >>>>>>> UNKNOWN Bio::Annotation::SimpleValue >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/SimpleValue.pm >>>>>>> UNKNOWN Bio::Annotation::TypeManager >>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/TypeManager.pm >>>>>>> UNKNOWN Bio::AnnotationCollectionI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationCollectionI.pm >>>>>>> UNKNOWN Bio::AnnotationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationI.pm >>>>>>> 1.006923 Bio::DB::Fasta >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/Fasta.pm >>>>>>> UNKNOWN Bio::DB::InMemoryCache >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/InMemoryCache.pm >>>>>>> UNKNOWN Bio::DB::IndexedBase >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/IndexedBase.pm >>>>>>> UNKNOWN Bio::DB::RandomAccessI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/RandomAccessI.pm >>>>>>> UNKNOWN Bio::DB::SeqI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/SeqI.pm >>>>>>> UNKNOWN Bio::DescribableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/DescribableI.pm >>>>>>> UNKNOWN Bio::Event::EventGeneratorI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventGeneratorI.pm >>>>>>> UNKNOWN Bio::Event::EventHandlerI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventHandlerI.pm >>>>>>> UNKNOWN Bio::Factory::ObjectFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactory.pm >>>>>>> UNKNOWN Bio::Factory::ObjectFactoryI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactoryI.pm >>>>>>> UNKNOWN Bio::Factory::SequenceFactoryI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/SequenceFactoryI.pm >>>>>>> UNKNOWN Bio::FeatureHolderI >>>>>>> /usr/local/share/perl/5.18.1/Bio/FeatureHolderI.pm >>>>>>> UNKNOWN Bio::IdentifiableI >>>>>>> /usr/local/share/perl/5.18.1/Bio/IdentifiableI.pm >>>>>>> UNKNOWN Bio::LocatableSeq >>>>>>> /usr/local/share/perl/5.18.1/Bio/LocatableSeq.pm >>>>>>> UNKNOWN Bio::Location::Atomic >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Atomic.pm >>>>>>> UNKNOWN Bio::Location::CoordinatePolicyI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/CoordinatePolicyI.pm >>>>>>> UNKNOWN Bio::Location::Fuzzy >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Fuzzy.pm >>>>>>> UNKNOWN Bio::Location::FuzzyLocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/FuzzyLocationI.pm >>>>>>> UNKNOWN Bio::Location::Simple >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Simple.pm >>>>>>> UNKNOWN Bio::Location::Split >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Split.pm >>>>>>> UNKNOWN Bio::Location::SplitLocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/SplitLocationI.pm >>>>>>> UNKNOWN Bio::Location::WidestCoordPolicy >>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/WidestCoordPolicy.pm >>>>>>> UNKNOWN Bio::LocationI >>>>>>> /usr/local/share/perl/5.18.1/Bio/LocationI.pm >>>>>>> UNKNOWN Bio::PrimarySeq >>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeq.pm >>>>>>> 1.006923 Bio::PrimarySeqI >>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeqI.pm >>>>>>> UNKNOWN Bio::Range /usr/local/share/perl/5.18.1/Bio/Range.pm >>>>>>> UNKNOWN Bio::RangeI /usr/local/share/perl/5.18.1/Bio/RangeI.pm >>>>>>> 1.006923 Bio::Root::Exception >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Exception.pm >>>>>>> UNKNOWN Bio::Root::HTTPget >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/HTTPget.pm >>>>>>> UNKNOWN Bio::Root::IO >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/IO.pm >>>>>>> 1.006923 Bio::Root::Root >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Root.pm >>>>>>> 1.006923 Bio::Root::RootI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/RootI.pm >>>>>>> 1.006923 Bio::Root::Version >>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Version.pm >>>>>>> UNKNOWN Bio::Search::HSP::GenericHSP >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/GenericHSP.pm >>>>>>> UNKNOWN Bio::Search::HSP::HSPFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPFactory.pm >>>>>>> UNKNOWN Bio::Search::HSP::HSPI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPI.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/Base.p>>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/august >>>>>>> us.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastn >>>>>>> .pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastx >>>>>>> .pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/cdna2g >>>>>>> enome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/est2ge >>>>>>> nome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/fgenes >>>>>>> h.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/genema >>>>>>> rk.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::gff3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/gff3.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/protei >>>>>>> n2genome.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::repeatmasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/repeat >>>>>>> masker.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snap.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snosca >>>>>>> n.pm >>>>>>> 0.01 Bio::Search::HSP::PhatHSP::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/tblast >>>>>>> x.pm >>>>>>> UNKNOWN Bio::Search::HSP::PhatHSP::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/trnasc >>>>>>> an.pm >>>>>>> 1.006923 Bio::Search::Hit::GenericHit >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/GenericHit.pm >>>>>>> UNKNOWN Bio::Search::Hit::HitFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitFactory.pm >>>>>>> UNKNOWN Bio::Search::Hit::HitI >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitI.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/Base.p>>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/august >>>>>>> us.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastn >>>>>>> .pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastx >>>>>>> .pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/cdna2g >>>>>>> enome.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/est2ge >>>>>>> nome.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/fgenes >>>>>>> h.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/genema >>>>>>> rk.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::gff3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/gff3.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/protei >>>>>>> n2genome.pm >>>>>>> 1.006923 Bio::Search::Hit::PhatHit::repeatmasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/repeat >>>>>>> masker.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snap.p >>>>>>> m >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snosca >>>>>>> n.pm >>>>>>> 0.01 Bio::Search::Hit::PhatHit::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/tblast >>>>>>> x.pm >>>>>>> UNKNOWN Bio::Search::Hit::PhatHit::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/trnasc >>>>>>> an.pm >>>>>>> 1.006923 Bio::Search::SearchUtils >>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/SearchUtils.pm >>>>>>> UNKNOWN Bio::SearchIO >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO.pm >>>>>>> UNKNOWN Bio::SearchIO::EventHandlerI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/EventHandlerI.pm >>>>>>> UNKNOWN Bio::SearchIO::SearchResultEventBuilder >>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/SearchResultEventBuilder.pm >>>>>>> UNKNOWN Bio::Seq /usr/local/share/perl/5.18.1/Bio/Seq.pm >>>>>>> UNKNOWN Bio::Seq::SeqFactory >>>>>>> /usr/local/share/perl/5.18.1/Bio/Seq/SeqFactory.pm >>>>>>> UNKNOWN Bio::SeqAnalysisParserI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqAnalysisParserI.pm >>>>>>> UNKNOWN Bio::SeqFeature::FeaturePair >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/FeaturePair.pm >>>>>>> UNKNOWN Bio::SeqFeature::Generic >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Generic.pm >>>>>>> UNKNOWN Bio::SeqFeature::Similarity >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Similarity.pm >>>>>>> UNKNOWN Bio::SeqFeature::SimilarityPair >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/SimilarityPair.pm >>>>>>> UNKNOWN Bio::SeqFeatureI >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeatureI.pm >>>>>>> UNKNOWN Bio::SeqI /usr/local/share/perl/5.18.1/Bio/SeqI.pm >>>>>>> UNKNOWN Bio::SeqUtils >>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqUtils.pm >>>>>>> 1.006923 Bio::Tools::CodonTable >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/CodonTable.pm >>>>>>> UNKNOWN Bio::Tools::GFF >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/GFF.pm >>>>>>> 1.006923 Bio::Tools::IUPAC >>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/IUPAC.pm >>>>>>> 7.3 Bit::Vector /usr/local/lib/perl/5.18.1/Bit/Vector.pm >>>>>>> 0.01 CGL::Annotation >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation.pm >>>>>>> 0.01 CGL::Annotation::Feature >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature.pm >>>>>>> 0.01 CGL::Annotation::Feature::Contig >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Contig >>>>>>> .pm >>>>>>> 0.01 CGL::Annotation::Feature::Exon >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Exon.p>>>>>>> m >>>>>>> 0.01 CGL::Annotation::Feature::Gene >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Gene.p>>>>>>> m >>>>>>> 0.01 CGL::Annotation::Feature::Intron >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Intron >>>>>>> .pm >>>>>>> 0.01 CGL::Annotation::Feature::Protein >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Protei >>>>>>> n.pm >>>>>>> 0.01 CGL::Annotation::Feature::Sequence_variant >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Sequen >>>>>>> ce_variant.pm >>>>>>> 0.01 CGL::Annotation::Feature::Transcript >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Transc >>>>>>> ript.pm >>>>>>> 0.01 CGL::Annotation::FeatureLocation >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureLocatio >>>>>>> n.pm >>>>>>> 0.01 CGL::Annotation::FeatureRelationship >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureRelatio >>>>>>> nship.pm >>>>>>> 0.01 CGL::Annotation::Iterator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Iterator.pm >>>>>>> 0.01 CGL::Annotation::Trace >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Trace.pm >>>>>>> 0.01 CGL::Clone >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Clone.pm >>>>>>> 0.01 CGL::Ontology::Node >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Node.pm >>>>>>> 0.01 CGL::Ontology::NodeRelationship >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/NodeRelationship >>>>>>> .pm >>>>>>> 0.01 CGL::Ontology::Ontology >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Ontology.pm >>>>>>> 0.01 CGL::Ontology::Parser::OBO >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Parser/OBO.pm >>>>>>> 0.01 CGL::Ontology::SO >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/SO.pm >>>>>>> 0.01 CGL::Ontology::Trace >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Trace.pm >>>>>>> 0.01 CGL::Revcomp >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Revcomp.pm >>>>>>> 0.01 CGL::TranslationMachine >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/TranslationMachine.pm >>>>>>> 1.32 Carp /usr/local/share/perl/5.18.1/Carp.pm >>>>>>> 1.32 Carp::Heavy /usr/local/share/perl/5.18.1/Carp/Heavy.pm >>>>>>> 0.64 Class::Struct /usr/share/perl/5.18/Class/Struct.pm >>>>>>> 0.36 Clone /usr/local/lib/perl/5.18.1/Clone.pm >>>>>>> 5.018001 Config /usr/lib/perl/5.18/Config.pm >>>>>>> 3.40 Cwd /usr/lib/perl/5.18/Cwd.pm >>>>>>> 1.42 DBD::SQLite /usr/local/lib/perl/5.18.1/DBD/SQLite.pm >>>>>>> 1.631 DBI /usr/local/lib/perl/5.18.1/DBI.pm >>>>>>> 1.827 DB_File /usr/lib/perl/5.18/DB_File.pm >>>>>>> 2.145 Data::Dumper /usr/lib/perl/5.18/Data/Dumper.pm >>>>>>> 0.11 Datastore::Base >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/Base.pm >>>>>>> 0.01 Datastore::MD5 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/MD5.pm >>>>>>> 2.53 Digest::MD5 /usr/local/lib/perl/5.18.1/Digest/MD5.pm >>>>>>> 1.16 Digest::base /usr/share/perl/5.18/Digest/base.pm >>>>>>> >>>>>>> UNKNOWN Dumper::GFF::GFFV3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/GFF/GFFV3.pm >>>>>>> UNKNOWN Dumper::XML::Game >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game.pm >>>>>>> UNKNOWN Dumper::XML::Game_Xml >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game_Xml.pm >>>>>>> 1.18 DynaLoader /usr/lib/perl/5.18/DynaLoader.pm >>>>>>> 1.18 Errno /usr/lib/perl/5.18/Errno.pm >>>>>>> 0.17015 Error >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm >>>>>>> UNKNOWN Error::Simple >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error/Simple.pm >>>>>>> 5.68 Exporter /usr/share/perl/5.18/Exporter.pm >>>>>>> 5.68 Exporter::Heavy /usr/share/perl/5.18/Exporter/Heavy.pm >>>>>>> UNKNOWN Fasta >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Fasta.pm >>>>>>> UNKNOWN FastaChunk >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunk.pm >>>>>>> UNKNOWN FastaChunker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunker.pm >>>>>>> UNKNOWN FastaDB >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaDB.pm >>>>>>> UNKNOWN FastaFile >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaFile.pm >>>>>>> UNKNOWN FastaSeq >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaSeq.pm >>>>>>> 1.11 Fcntl /usr/lib/perl/5.18/Fcntl.pm >>>>>>> 2.84 File::Basename /usr/share/perl/5.18/File/Basename.pm >>>>>>> 2.26 File::Copy /usr/share/perl/5.18/File/Copy.pm >>>>>>> 1.20 File::Glob /usr/lib/perl/5.18/File/Glob.pm >>>>>>> 1.20 File::NFSLock >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/NFSLock.pm >>>>>>> 2.09 File::Path /usr/share/perl/5.18/File/Path.pm >>>>>>> 3.40 File::Spec /usr/lib/perl/5.18/File/Spec.pm >>>>>>> 3.40 File::Spec::Unix /usr/lib/perl/5.18/File/Spec/Unix.pm >>>>>>> 0.2304 File::Temp /usr/local/share/perl/5.18.1/File/Temp.pm >>>>>>> 1.09 File::Which >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/Which.pm >>>>>>> 2.02 FileHandle /usr/share/perl/5.18/FileHandle.pm >>>>>>> 1.51 FindBin /usr/share/perl/5.18/FindBin.pm >>>>>>> UNKNOWN GFFDB >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> UNKNOWN GI /usr/local/annotation/maker2.31/bin/../lib/GI.pm >>>>>>> 2.42 Getopt::Long /usr/local/share/perl/5.18.1/Getopt/Long.pm >>>>>>> 6.02 HTTP::Date /usr/share/perl5/HTTP/Date.pm >>>>>>> 6.05 HTTP::Headers /usr/share/perl5/HTTP/Headers.pm >>>>>>> 6.06 HTTP::Message /usr/share/perl5/HTTP/Message.pm >>>>>>> 6.00 HTTP::Request /usr/share/perl5/HTTP/Request.pm >>>>>>> 6.04 HTTP::Response /usr/share/perl5/HTTP/Response.pm >>>>>>> 6.03 HTTP::Status /usr/share/perl5/HTTP/Status.pm >>>>>>> 1.28 IO /usr/lib/perl/5.18/IO.pm >>>>>>> 1.16 IO::File /usr/lib/perl/5.18/IO/File.pm >>>>>>> 1.34 IO::Handle /usr/lib/perl/5.18/IO/Handle.pm >>>>>>> 1.1 IO::Seekable /usr/lib/perl/5.18/IO/Seekable.pm >>>>>>> 1.21 IO::Select /usr/lib/perl/5.18/IO/Select.pm >>>>>>> 1.36 IO::Socket /usr/lib/perl/5.18/IO/Socket.pm >>>>>>> 1.33 IO::Socket::INET /usr/lib/perl/5.18/IO/Socket/INET.pm >>>>>>> 1.24 IO::Socket::UNIX /usr/lib/perl/5.18/IO/Socket/UNIX.pm >>>>>>> 1.13 IPC::Open3 /usr/share/perl/5.18/IPC/Open3.pm >>>>>>> 0.53 Inline /usr/local/share/perl/5.18.1/Inline.pm >>>>>>> UNKNOWN Inline::denter >>>>>>> /usr/local/share/perl/5.18.1/Inline/denter.pm >>>>>>> UNKNOWN Iterator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator.pm >>>>>>> UNKNOWN Iterator::Any >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Any.pm >>>>>>> UNKNOWN Iterator::Fasta >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Fasta.pm >>>>>>> UNKNOWN Iterator::GFF3 >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/GFF3.pm >>>>>>> 6.05 LWP /usr/share/perl5/LWP.pm >>>>>>> UNKNOWN LWP::MemberMixin /usr/share/perl5/LWP/MemberMixin.pm >>>>>>> 6.00 LWP::Protocol /usr/share/perl5/LWP/Protocol.pm >>>>>>> 6.05 LWP::UserAgent /usr/share/perl5/LWP/UserAgent.pm >>>>>>> 0.33 List::MoreUtils >>>>>>> /usr/local/lib/perl/5.18.1/List/MoreUtils.pm >>>>>>> 1.38 List::Util /usr/local/lib/perl/5.18.1/List/Util.pm >>>>>>> UNKNOWN MAKER::ConfigData >>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/MAKER/ConfigData.pm >>>>>>> 1.32 POSIX /usr/lib/perl/5.18/POSIX.pm >>>>>>> 0.01 Parallel::Application::MPI >>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/Parallel/Application/MPI >>>>>>> .pm >>>>>>> 0.02 Perl::Unsafe::Signals >>>>>>> /usr/local/lib/perl/5.18.1/Perl/Unsafe/Signals.pm >>>>>>> UNKNOWN PhatHit_utils >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PhatHit_utils.pm >>>>>>> UNKNOWN PostData >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PostData.pm >>>>>>> 1.0 Proc::ProcessTable_simple >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/ProcessTable_simple.pm >>>>>>> 1.0 Proc::Signal >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/Signal.pm >>>>>>> UNKNOWN Process::MpiChunk >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm >>>>>>> UNKNOWN Process::MpiTiers >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiTiers.pm >>>>>>> 1.38 Scalar::Util /usr/local/lib/perl/5.18.1/Scalar/Util.pm >>>>>>> 1.02 SelectSaver /usr/share/perl/5.18/SelectSaver.pm >>>>>>> UNKNOWN Shadower >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Shadower.pm >>>>>>> UNKNOWN SimpleCluster >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SimpleCluster.pm >>>>>>> 2.009 Socket /usr/lib/perl/5.18/Socket.pm >>>>>>> UNKNOWN SpaceBase >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SpaceBase.pm >>>>>>> 2.45 Storable /usr/local/lib/perl/5.18.1/Storable.pm >>>>>>> 1.07 Symbol /usr/share/perl/5.18/Symbol.pm >>>>>>> 1.17 Sys::Hostname /usr/lib/perl/5.18/Sys/Hostname.pm >>>>>>> 0.21 Sys::SigAction >>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction.pm >>>>>>> UNKNOWN Sys::SigAction::Alarm >>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction/Alarm.pm >>>>>>> 4.02 Term::ANSIColor /usr/share/perl/5.18/Term/ANSIColor.pm >>>>>>> 4.2 Tie::Handle /usr/share/perl/5.18/Tie/Handle.pm >>>>>>> 1.04 Tie::Hash /usr/share/perl/5.18/Tie/Hash.pm >>>>>>> 4.3 Tie::StdHandle /usr/share/perl/5.18/Tie/StdHandle.pm >>>>>>> 1.9726 Time::HiRes /usr/local/lib/perl/5.18.1/Time/HiRes.pm >>>>>>> 1.2300 Time::Local /usr/share/perl/5.18/Time/Local.pm >>>>>>> 1.60 URI /usr/share/perl5/URI.pm >>>>>>> 3.31 URI::Escape /usr/share/perl5/URI/Escape.pm >>>>>>> UNKNOWN Widget >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget.pm >>>>>>> UNKNOWN Widget::RepeatMasker >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/RepeatMasker.pm >>>>>>> UNKNOWN Widget::augustus >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/augustus.pm >>>>>>> >>>>>>> UNKNOWN Widget::blastn >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastn.pm >>>>>>> >>>>>>> UNKNOWN Widget::blastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastx.pm >>>>>>> >>>>>>> UNKNOWN Widget::exonerate >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate.pm >>>>>>> >>>>>>> UNKNOWN Widget::exonerate::cdna2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/cdna2genome. >>>>>>> pm >>>>>>> UNKNOWN Widget::exonerate::est2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/est2genome.p >>>>>>> m >>>>>>> UNKNOWN Widget::exonerate::protein2genome >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/protein2geno >>>>>>> me.pm >>>>>>> UNKNOWN Widget::fgenesh >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/fgenesh.pm >>>>>>> >>>>>>> UNKNOWN Widget::formater >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/formater.pm >>>>>>> >>>>>>> UNKNOWN Widget::genemark >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/genemark.pm >>>>>>> >>>>>>> UNKNOWN Widget::snap >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snap.pm >>>>>>> >>>>>>> UNKNOWN Widget::snoscan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snoscan.pm >>>>>>> >>>>>>> UNKNOWN Widget::tblastx >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/tblastx.pm >>>>>>> >>>>>>> UNKNOWN Widget::trnascan >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/trnascan.pm >>>>>>> >>>>>>> 0.16 XSLoader /usr/share/perl/5.18/XSLoader.pm >>>>>>> 0.21 attributes /usr/lib/perl/5.18/attributes.pm >>>>>>> >>>>>>> 2.18 base /usr/share/perl/5.18/base.pm >>>>>>> 1.04 bytes /usr/share/perl/5.18/bytes.pm >>>>>>> UNKNOWN clean >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/clean.pm >>>>>>> UNKNOWN cluster >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/cluster.pm >>>>>>> >>>>>>> UNKNOWN compare >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/compare.pm >>>>>>> >>>>>>> 1.27 constant /usr/share/perl/5.18/constant.pm >>>>>>> >>>>>>> UNKNOWN ds_utility >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/ds_utility.pm >>>>>>> >>>>>>> UNKNOWN exonerate::splice_info >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/exonerate/splice_info.pm >>>>>>> >>>>>>> 0.34 forks /usr/local/lib/perl/5.18.1/forks.pm >>>>>>> >>>>>>> 2.08001 forks::Devel::Symdump >>>>>>> /usr/local/lib/perl/5.18.1/forks/Devel/Symdump.pm >>>>>>> 0.34 forks::shared /usr/local/lib/perl/5.18.1/forks/shared.pm >>>>>>> >>>>>>> 0.34 forks::signals >>>>>>> /usr/local/lib/perl/5.18.1/forks/signals.pm >>>>>>> 1.00 integer /usr/share/perl/5.18/integer.pm >>>>>>> >>>>>>> 0.63 lib /usr/lib/perl/5.18/lib.pm >>>>>>> 1.02 locale /usr/share/perl/5.18/locale.pm >>>>>>> UNKNOWN maker::auto_annotator >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/auto_annotator.pm >>>>>>> >>>>>>> UNKNOWN maker::join >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/join.pm >>>>>>> >>>>>>> UNKNOWN maker::quality_index >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/quality_index.pm >>>>>>> >>>>>>> UNKNOWN maker::sens_spec >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/sens_spec.pm >>>>>>> >>>>>>> 1.22 overload /usr/share/perl/5.18/overload.pm >>>>>>> >>>>>>> 0.02 overloading /usr/share/perl/5.18/overloading.pm >>>>>>> >>>>>>> 0.225 parent /usr/share/perl/5.18/parent.pm >>>>>>> UNKNOWN polisher >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::altest >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/altest.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::est >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/est.pm >>>>>>> >>>>>>> UNKNOWN polisher::exonerate::protein >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/protein.pm >>>>>>> >>>>>>> UNKNOWN repeat_mask_seq >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/repeat_mask_seq.pm >>>>>>> >>>>>>> 0.1 runlog >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/runlog.pm >>>>>>> UNKNOWN shadow_AED >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/shadow_AED.pm >>>>>>> 1.07 sigtrap /usr/share/perl/5.18/sigtrap.pm >>>>>>> >>>>>>> 1.07 strict /usr/share/perl/5.18/strict.pm >>>>>>> 1.77 threads /usr/local/lib/perl/5.18.1/forks.pm >>>>>>> >>>>>>> 1.33 threads::shared >>>>>>> /usr/local/lib/perl/5.18.1/forks/shared.pm >>>>>>> 1.03 vars /usr/share/perl/5.18/vars.pm >>>>>>> 1.18 warnings /usr/share/perl/5.18/warnings.pm >>>>>>> >>>>>>> 1.02 warnings::register >>>>>>> /usr/share/perl/5.18/warnings/register.pm >>>>>>> STATUS: Parsing control files... >>>>>>> Calling GI::load_control_files at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 452. >>>>>>> Calling GI::new_instance_temp at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 463. >>>>>>> Calling GI::mount_check at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 465. >>>>>>> Calling GI::set_global_temp at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 483. >>>>>>> STATUS: Processing and indexing input FASTA files... >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 519. >>>>>>> Calling List::Util::shuffle at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 529. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 536. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 537. >>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling Iterator::Any::nextFastaRef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling File::NFSLock::unlock at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 539. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling File::NFSLock::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> Calling GI::create_blastdb at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 574. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling Iterator::Any::nextDef at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575. >>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 575. >>>>>>> Calling GI::build_fasta_index at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 622. >>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 623. >>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> STATUS: Setting up database for any GFF3 input... >>>>>>> Calling GFFDB::new at /usr/local/annotation/maker2.31/bin/maker line >>>>>>> 629. >>>>>>> Calling GFFDB::next_build at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 631. >>>>>>> Calling ds_utility::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 635. >>>>>>> A data structure will be created for you at: >>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker >>>>>>> .output/dpp_contig_datastore >>>>>>> >>>>>>> To access files for individual sequences use the datastore index: >>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker >>>>>>> .output/dpp_contig_master_datastore_index.log >>>>>>> >>>>>>> Calling Datastore::MD5::new at /usr/local/annotation/maker2.31/bin/maker >>>>>>> line 636. >>>>>>> Calling Iterator::Fasta::new at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 639. >>>>>>> Calling Iterator::Fasta::skip_file at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 641. >>>>>>> Calling Iterator::Fasta::step at >>>>>>> /usr/local/annotation/maker2.31/bin/maker line 643. >>>>>>> STATUS: Now running MAKER... >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now starting the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Calling FastaDB::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> line 107. >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108. >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> Processing run.log file... >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now retrying the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> Tries: 2!! >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Calling FastaDB::new at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462. >>>>>>> Calling out to BioPerl get_PrimarySeq_stream at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894. >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/ >>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open >>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> line 107. >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108. >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> >>>>>>> examining contents of the fasta file and run log >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::id_to_dir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling Datastore::MD5::mkdir at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling uri_escape at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> Calling File::Path::mkpath at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439. >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> Processing run.log file... >>>>>>> >>>>>>> >>>>>>> Maker is now finished!!! >>>>>>> >>>>>>> Many thanks for you help >>>>>>> >>>>>>> Christelle >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-03-19 14:01 GMT+01:00 Carson Holt : >>>>>>> Your problem is one of the following. You need to reinstall the >>>>>>> DBD::SQLite module, you are running in a directory you don?t have >>>>>>> permissions for, you set your TMDIR environmental variable or TMP value >>>>>>> in maker_opts.ctl to an NFS mounted or memory mounted directory, or you >>>>>>> are using a self compiled version of Perl (I.e. not /usr/bin/perl) that >>>>>>> has issues (probably with DB or SQLite modules). You can also >>>>>>> completely delete the output directory, and start again to see if it was >>>>>>> just a random error. You should look at each of those first. You can >>>>>>> also run MAKER with the --debug command line flag and send it to me if >>>>>>> all of those seem not to be the issue. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> From: Chris Bioinfo >>>>>>> Date: Wednesday, March 19, 2014 at 5:09 AM >>>>>>> To: >>>>>>> Subject: [maker-devel] Annotation with maker2 >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I'm installing/using maker2 for the first time and I have an error by >>>>>>> using it. >>>>>>> >>>>>>> I certainly missing something, but I don't know what. >>>>>>> >>>>>>> I compile maker with no error message and I have all these directories >>>>>>> after compilation: >>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README src >>>>>>> >>>>>>> Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I >>>>>>> have this error: >>>>>>> >>>>>>> STATUS: Now running MAKER... >>>>>>> examining contents of the fasta file and run log >>>>>>> >>>>>>> >>>>>>> >>>>>>> --Next Contig-- >>>>>>> >>>>>>> #--------------------------------------------------------------------- >>>>>>> Now starting the contig!! >>>>>>> SeqID: contig-dpp-500-500 >>>>>>> Length: 32156 >>>>>>> #--------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> setting up GFF3 output and fasta chunks >>>>>>> doing repeat masking >>>>>>> DBI >>>>>>> connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) >>>>>>> failed: unable to open database file at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> >>>>>>> Can't call method "do" on an undefined value at >>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm >>>>>>> --> rank=NA, hostname=belem >>>>>>> ERROR: Failed while doing repeat masking >>>>>>> ERROR: Chunk failed at level:0, tier_type:1 >>>>>>> FAILED CONTIG:contig-dpp-500-500 >>>>>>> ... >>>>>>> >>>>>>> ideas? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Christelle >>>>>>> >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Fri Mar 21 09:43:59 2014 From: jfierst at uoregon.edu (Janna Fierst) Date: Fri, 21 Mar 2014 08:43:59 -0700 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: Hi, I just wanted to say thanks for all your help- I did the reciprocal best blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to associate names between strain assemblies/annotations. Worked perfectly! -Janna On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt wrote: > maker_map_ids does a translation (i.e. change gene-A to smug1), so you > need to know which genes you want to translate names to (two column input > file, column 1 -> original ID, column 2 -> new ID). I'm not sure EST > forward is the best way to do this, although I do think maker_map_ids is > the tool to use in the end. The question is how to make a list of IDs to > translate as the input to maker_map_ids? > > I would actually just use BLASTP against the reference strain, and then > do reciprocal best BLAST hits. To do this you BLAST your reference > proteins against your maker proteins. Then do the opposite, BLAST your > maker proteins against your reference proteins. If they are both each > others best hit, then they are orthologous, and you can safely make a two > column entry for the maker_map_ids input (i.e. maker-gene-1 translates into > smug1). > > --Carson > > > From: Daniel Ence > Date: Friday, March 14, 2014 at 11:32 AM > To: Janna Fierst , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] associating gene names between related strains > > Hi Janna, So do you have one strain that you want to use as the reference > for all the others? There's a script that comes with MAKER called > maker_map_ids that lets you use a common prefix or suffix for entries in a > fasta file from one strain and then use est_forward to use that ID in the > gene models for the other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > Janna Fierst [jfierst at uoregon.edu] > *Sent:* Friday, March 14, 2014 10:06 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate > the gene naming so that orthologs between species can be associated by > name. I have been playing around a little with the est_forward option but > can't figure out a good system/workflow that preserves names but still uses > the strain-specific RNA-Seq EST set for the actual gene models. Thanks! > -Janna > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 21 09:54:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 09:54:15 -0600 Subject: [maker-devel] associating gene names between related strains In-Reply-To: References: Message-ID: I'm glad we could help. --Carson From: Janna Fierst Date: Friday, March 21, 2014 at 9:43 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] associating gene names between related strains Hi, I just wanted to say thanks for all your help- I did the reciprocal best blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to associate names between strain assemblies/annotations. Worked perfectly! -Janna On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt wrote: > maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to > know which genes you want to translate names to (two column input file, column > 1 -> original ID, column 2 -> new ID). I?m not sure EST forward is the best > way to do this, although I do think maker_map_ids is the tool to use in the > end. The question is how to make a list of IDs to translate as the input to > maker_map_ids? > > I would actually just use BLASTP against the reference strain, and then do > reciprocal best BLAST hits. To do this you BLAST your reference proteins > against your maker proteins. Then do the opposite, BLAST your maker proteins > against your reference proteins. If they are both each others best hit, then > they are orthologous, and you can safely make a two column entry for the > maker_map_ids input (i.e. maker-gene-1 translates into smug1). > > ?Carson > > > From: Daniel Ence > Date: Friday, March 14, 2014 at 11:32 AM > To: Janna Fierst , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] associating gene names between related strains > > Hi Janna, So do you have one strain that you want to use as the reference for > all the others? There's a script that comes with MAKER called maker_map_ids > that lets you use a common prefix or suffix for entries in a fasta file from > one strain and then use est_forward to use that ID in the gene models for the > other species. > > Let me know if that's not what you're looking for, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna > Fierst [jfierst at uoregon.edu] > Sent: Friday, March 14, 2014 10:06 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] associating gene names between related strains > > Hi, > > we are assembling and annotating genomes for several related strains of > Caenorhabditis worms and I was wondering if there is a way to coordinate the > gene naming so that orthologs between species can be associated by name. I > have been playing around a little with the est_forward option but can't figure > out a good system/workflow that preserves names but still uses the > strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hossein.Borhan at AGR.GC.CA Fri Mar 21 10:41:38 2014 From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein) Date: Fri, 21 Mar 2014 16:41:38 +0000 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts In-Reply-To: References: Message-ID: Dear Carson I ran maker and modified .pm files and it resolved the problem with the fasta output. Thanks a lot for your help. HB On 14-03-17 1:45 PM, "Carson Holt" wrote: >I have attached 4 files for you to place in the .../maker/Widgets/ >directory. > >The *blast.pm files will suppress the BLAST+ failures you are getting >(alternatively you can just downgrade to BLAST 2.27 to get the same >effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. >In the latest release NCBI redid all their warnings and error codes so it >spits out a lot of garbage and fails with different messages than it did >before. For example BLAST now warns you every time it encounter a fasta >header with a comment (virtually every fasta entry in existence falls in >this category), so your screen will be awash with meaningless warning >messages. > >The fgenesh.pm file will fix the other failure, which only occurs if you >use fgenesh simultaneously with the est_fustion=1 option. No other >predictors are affected. > >Thanks, >Carson > > >On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: > >>Dear Carson >> >>Sorry for the late reply. I was away for a couple of days. I have >>uploaded >>the out put files plus control and error output on the FTP site that you >>provided >>The user ID is borhanh >> >>I used blast+ for this run. >> >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >>On 14-03-13 10:00 AM, "Carson Holt" >>wrote: >> >>>Just resending this to the correct maker-devel address. Please when >>>replying, do not CC the incorrect maker-devel-bounce address. >>> >>>Thanks, >>>Carson >>> >>> >>>On 3/13/14, 9:56 AM, "Carson Holt" >>>wrote: >>> >>>>FGENESH is not a heavily used tool, so depending on which version it is >>>>(either too old or too new), output might be slightly different which >>>>could cause incorrect parsing. Could you tar up your maker.output >>>>folder, >>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>(send me either your user/guest ID after you upload). >>>> >>>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>>is >>>>the old legacy version of NCBI BLAST. You can do this by setting the >>>>blast type in maker_bopts.ctl and the location of executables in >>>>maker_exe.ctl. >>>> >>>>Thanks, >>>>Carson >>>> >>>> >>>> >>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>>wrote: >>>> >>>>>Dear Maker users >>>>> >>>>> >>>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>>the >>>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>>inserted in the nucleotide sequence of some of the genes. This seems >>>>>to >>>>>be related to transcripts predicted by fgenesh_masked. >>>>> >>>>> >>>>>Here is an example for one of the genes >>>>> >>>>> >>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>>offset:0 AE >>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>>TTTCGACAAGC >>>>> >>>>>The same genome sequence was used for the first round of maker (2.10) >>>>>without such problem. I checked the sequence for the scaffold related >>>>>to >>>>>one of the affected transcripts and there was no error in the >>>>>sequence. >>>>>I am not sure what is causing this. The only error that I could spot >>>>>in >>>>>the output error file is the following >>>>> >>>>> >>>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>>contexts/frames of query sequences. >>>>> >>>>> >>>>> >>>>>Your help is appreciated >>>>> >>>>> >>>>> >>>>>HB >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> > From carsonhh at gmail.com Fri Mar 21 10:43:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Mar 2014 10:43:10 -0600 Subject: [maker-devel] non-nucleotide characters in the maker generated transcripts Message-ID: Thanks for letting me know. --Carson On 3/21/14, 10:41 AM, "Borhan, Hossein" wrote: >Dear Carson > >I ran maker and modified .pm files and it resolved the problem with the >fasta output. Thanks a lot for your help. > > > > >HB > > > > > > > > >On 14-03-17 1:45 PM, "Carson Holt" wrote: > >>I have attached 4 files for you to place in the .../maker/Widgets/ >>directory. >> >>The *blast.pm files will suppress the BLAST+ failures you are getting >>(alternatively you can just downgrade to BLAST 2.27 to get the same >>effect). BLAST 2.29 gives a lot of warnings etc., which you can ignore. >>In the latest release NCBI redid all their warnings and error codes so it >>spits out a lot of garbage and fails with different messages than it did >>before. For example BLAST now warns you every time it encounter a fasta >>header with a comment (virtually every fasta entry in existence falls in >>this category), so your screen will be awash with meaningless warning >>messages. >> >>The fgenesh.pm file will fix the other failure, which only occurs if you >>use fgenesh simultaneously with the est_fustion=1 option. No other >>predictors are affected. >> >>Thanks, >>Carson >> >> >>On 3/14/14, 5:14 PM, "Borhan, Hossein" wrote: >> >>>Dear Carson >>> >>>Sorry for the late reply. I was away for a couple of days. I have >>>uploaded >>>the out put files plus control and error output on the FTP site that you >>>provided >>>The user ID is borhanh >>> >>>I used blast+ for this run. >>> >>> >>> >>> >>>Regards >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-03-13 10:00 AM, "Carson Holt" >>>wrote: >>> >>>>Just resending this to the correct maker-devel address. Please when >>>>replying, do not CC the incorrect maker-devel-bounce address. >>>> >>>>Thanks, >>>>Carson >>>> >>>> >>>>On 3/13/14, 9:56 AM, "Carson Holt" >>>>wrote: >>>> >>>>>FGENESH is not a heavily used tool, so depending on which version it >>>>>is >>>>>(either too old or too new), output might be slightly different which >>>>>could cause incorrect parsing. Could you tar up your maker.output >>>>>folder, >>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>>>(send me either your user/guest ID after you upload). >>>>> >>>>>For the BLAST error, use BLAST+ instead. You are using blastall which >>>>>is >>>>>the old legacy version of NCBI BLAST. You can do this by setting the >>>>>blast type in maker_bopts.ctl and the location of executables in >>>>>maker_exe.ctl. >>>>> >>>>>Thanks, >>>>>Carson >>>>> >>>>> >>>>> >>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" >>>>>wrote: >>>>> >>>>>>Dear Maker users >>>>>> >>>>>> >>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted >>>>>>the >>>>>>word SCLAR followed by a pair of bracket like this (0x22de7020) >>>>>>inserted in the nucleotide sequence of some of the genes. This seems >>>>>>to >>>>>>be related to transcripts predicted by fgenesh_masked. >>>>>> >>>>>> >>>>>>Here is an example for one of the genes >>>>>> >>>>>> >>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript >>>>>>>offset:0 AE >>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651 >>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23 >>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA >>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG >>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC >>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT >>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC >>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT >>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA >>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA >>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT >>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT >>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC >>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG >>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG >>>>>>TTTCGACAAGC >>>>>> >>>>>>The same genome sequence was used for the first round of maker (2.10) >>>>>>without such problem. I checked the sequence for the scaffold related >>>>>>to >>>>>>one of the affected transcripts and there was no error in the >>>>>>sequence. >>>>>>I am not sure what is causing this. The only error that I could spot >>>>>>in >>>>>>the output error file is the following >>>>>> >>>>>> >>>>>>[blastall] FATAL ERROR: search cannot proceed due to errors in all >>>>>>contexts/frames of query sequences. >>>>>> >>>>>> >>>>>> >>>>>>Your help is appreciated >>>>>> >>>>>> >>>>>> >>>>>>HB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Mon Mar 24 04:08:25 2014 From: marc.hoeppner at imbim.uu.se (=?iso-8859-1?Q?Marc_H=F6ppner?=) Date: Mon, 24 Mar 2014 10:08:25 +0000 Subject: [maker-devel] Annotations from proteins, follow-up Message-ID: <10AFC7D0-82BA-4527-9B77-80DC4BE80CFD@imbim.uu.se> Hi, I had previously inquired about protein-based gene building (for example to create a training set for SNAP). This is currently possible with Maker (2.31), but I noticed a limitation. Specifically, I tend to run Maker once to generate all the raw computes (protein and set alignments, mostly). I then separate these out into GFF files that I can store away and use in various combinations of settings and data in parallel. However, the protein2genome option does not seem to work off pre-aligned protein data (e.g. protein2genome.gff produced with Maker). Is that intentional and is there a work-around? Or is the only option to run this with fasta files? Cheers, Marc Marc P. Hoeppner, PhD Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From sujaikumar at gmail.com Mon Mar 24 08:15:16 2014 From: sujaikumar at gmail.com (Sujai) Date: Mon, 24 Mar 2014 14:15:16 +0000 Subject: [maker-devel] Dashes in transcript predictions Message-ID: Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nGt.0.3.035610.maker.output.tgz Type: application/x-gzip Size: 45641 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 24 10:49:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 10:49:46 -0600 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: I've actually never seen that before, but looking through your output it appears to be specifically caused by setting correct_est_fusion=1, and how it interacts with some features of your dataset. I've attached a patch in the form of a file you can use to replace .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER download. Thanks, Carson From: Sujai Date: Monday, March 24, 2014 at 8:15 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Dashes in transcript predictions Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN KKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: join.pm Type: text/x-perl-script Size: 18645 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 24 11:05:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 11:05:15 -0600 Subject: [maker-devel] Annotations from proteins, follow-up Message-ID: It not so much intentional as it is a a limitation of the information in GFF3 format alignments. Right now protein2genome for Eukaryotes will only try and make exonerate derived alignments work because they have been polished around splice sites and MAKER still has access to the original protein sequence and alignment cigar string fro additional filtering, etc. With GFF3 pass-through the algorithm doesn't know nearly as much about what is passed in. For example the protein sequence is gone, cigar alignment strings are rarely included (Gap= attribute in GFF3), and it's not always clear if the alignment was polished for splice sites. Also since protein2genome=1 is expected to be used only to generate an initial training set, and not for final annotations, this is considered a reasonable restriction. If you still really want to force protein alignments from a GFF3 to be considered as potential models, you could put them in as pred_gff. In which case they will always be considered as potential models. Of course it will be relatively ugly because you lack things I mentioned before such as the alignment cigar string and original protein sequence that are normally used to filter protein2genome results for inclusion as models. --Carson On 3/24/14, 4:08 AM, "Marc H?ppner" wrote: >Hi, > >I had previously inquired about protein-based gene building (for example >to create a training set for SNAP). This is currently possible with Maker >(2.31), but I noticed a limitation. Specifically, I tend to run Maker >once to generate all the raw computes (protein and set alignments, >mostly). I then separate these out into GFF files that I can store away >and use in various combinations of settings and data in parallel. > >However, the protein2genome option does not seem to work off pre-aligned >protein data (e.g. protein2genome.gff produced with Maker). Is that >intentional and is there a work-around? Or is the only option to run this >with fasta files? > >Cheers, > >Marc > > >Marc P. Hoeppner, PhD > >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at imbim.uu.se > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 24 12:15:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 12:15:39 -0600 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: One more note on this. The sequence is actually fully correct if you just remove the '-' characters. So if you don't want to rerun MAKER with the patch, then you can use the attached script to just repair the transcript file by removing the '-' characters. Your GFF3 files and proteins files should already be correct as is. Usage --> perl fix_dash transcript_file.fasta > new_file.fasta You may need to place the script in the .../maker/bin/ directory so it can detect BioPerl if you don't have BioPerl installed system wide. Thanks, Carson From: Carson Holt Date: Monday, March 24, 2014 at 10:49 AM To: Sujai , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Dashes in transcript predictions I've actually never seen that before, but looking through your output it appears to be specifically caused by setting correct_est_fusion=1, and how it interacts with some features of your dataset. I've attached a patch in the form of a file you can use to replace .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER download. Thanks, Carson From: Sujai Date: Monday, March 24, 2014 at 8:15 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Dashes in transcript predictions Dear Maker Team On a recent run with maker 2.31, I noticed that a couple of the transcripts had dashes/hyphens in them. Example: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA The protein prediction for this transcript is ok: >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN KKMVVWVVSSLPSAAIRNAKRRINEQSSHV Is this a known bug? I tried searching for "dash|hyphen" in the email list but couldn't find anything else. Best wishes, - Sujai ps. I pulled out just this one contig and ran maker on it. all the .maker.output files are attached. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sujaikumar at gmail.com Mon Mar 24 12:17:02 2014 From: sujaikumar at gmail.com (Sujai) Date: Mon, 24 Mar 2014 18:17:02 +0000 Subject: [maker-devel] Dashes in transcript predictions In-Reply-To: References: Message-ID: Wow. That was a super quick response. Thanks very much for confirming the problem and the fixes! On 24 March 2014 18:15, Carson Holt wrote: > One more note on this. The sequence is actually fully correct if you just > remove the '-' characters. So if you don't want to rerun MAKER with the > patch, then you can use the attached script to just repair the transcript > file by removing the '-' characters. Your GFF3 files and proteins files > should already be correct as is. > > Usage --> perl fix_dash transcript_file.fasta > new_file.fasta > > You may need to place the script in the .../maker/bin/ directory so it can > detect BioPerl if you don't have BioPerl installed system wide. > > Thanks, > Carson > > From: Carson Holt > Date: Monday, March 24, 2014 at 10:49 AM > To: Sujai , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] Dashes in transcript predictions > > I've actually never seen that before, but looking through your output it > appears to be specifically caused by setting correct_est_fusion=1, and how > it interacts with some features of your dataset. > > I've attached a patch in the form of a file you can use to replace > .../maker/lib/maker/join.pm. I'm also going to add it to the MAKER > download. > > Thanks, > Carson > > > From: Sujai > Date: Monday, March 24, 2014 at 8:15 AM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Dashes in transcript predictions > > Dear Maker Team > > On a recent run with maker 2.31, I noticed that a couple of the > transcripts had dashes/hyphens in them. > > Example: > >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript > offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 > TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG > AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT > GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG > GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT > AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG > ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT > AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT > TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA > > The protein prediction for this transcript is ok: > > >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 > eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240 > > MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY > > DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV > > Is this a known bug? I tried searching for "dash|hyphen" in the email list > but couldn't find anything else. > > Best wishes, > > - Sujai > > ps. I pulled out just this one contig and ran maker on it. all the > .maker.output files are attached. > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana.garnica at anu.edu.au Mon Mar 24 17:11:01 2014 From: diana.garnica at anu.edu.au (Diana Garnica Moreno) Date: Mon, 24 Mar 2014 23:11:01 +0000 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Message-ID: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> Hi there, We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? Thanks! Diana -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 24 17:25:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Mar 2014 17:25:09 -0600 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Message-ID: You are probably getting the wrong proteins from your scripts because you are not taking into account the 5' and 3' UTR in the transcript. For example >snap_masked-contig-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|22|240 The 5' UTR is 261bp and the 3' UTR is 22bp long. Both would have to be trimmed before translating the transcript into a protein. Once they are trimmed you can use frame 0 for the translation. The fasta_tool that comes with MAKER can be used to quickly trim the UTR. Example: fasta_tool maker_transcripts.fasta --trim_maker_utr Then you can try your other scripts again. Thanks, Carson From: Diana Garnica Moreno Date: Monday, March 24, 2014 at 5:11 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER Hi there, We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? Thanks! Diana _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Tue Mar 25 07:24:14 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Tue, 25 Mar 2014 09:24:14 -0400 Subject: [maker-devel] Maker iPlant image Message-ID: Greetings, I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From ernesto at ebi.ac.uk Tue Mar 25 04:10:59 2014 From: ernesto at ebi.ac.uk (ernesto lowy gallego) Date: Tue, 25 Mar 2014 10:10:59 +0000 Subject: [maker-devel] Incorrect translation start codon Message-ID: <53315633.2070702@ebi.ac.uk> Hi, I have been inspecting the MAKER predictions and I detected a situation which appears with a certain frequency. (See attached Apollo screenshot illustrating the situation I am going to describe): Let's say that there is est2genome evidence supporting the prediction of the 5' UTR region, I have realized that in some of these transcripts with 5'UTR, MAKER is not capable of identifying the right downstream ATG protein start codon and considers a TTG codon (coding for L) as the incorrect protein start. The proper ATG codon start is further downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly predict in this case (see the attached screenshot) Any comments on this? Thanks! ernesto -- Developer VectorBase | Ensembl Genomes -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2014-03-25 at 09.34.16.png Type: image/png Size: 32220 bytes Desc: not available URL: From carsonhh at gmail.com Tue Mar 25 08:19:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Mar 2014 08:19:22 -0600 Subject: [maker-devel] Incorrect translation start codon In-Reply-To: <53315633.2070702@ebi.ac.uk> References: <53315633.2070702@ebi.ac.uk> Message-ID: This is caused by BioPerl's is_start_codon method and default codon table returning true for non-canonical start codons. It was resolved some time ago (See previous discussion --> https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY ). Make sure you are using the most recent version of MAKER (currently 2.31). Thanks, Carson https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY On 3/25/14, 4:10 AM, "ernesto lowy gallego" wrote: >Hi, > >I have been inspecting the MAKER predictions and I detected a situation >which appears with a certain frequency. >(See attached Apollo screenshot illustrating the situation I am going to >describe): > >Let's say that there is est2genome evidence supporting the prediction of >the 5' UTR region, I have realized that in some of these transcripts >with 5'UTR, MAKER is not capable of identifying the right downstream ATG >protein start codon and considers a TTG codon (coding for L) as the >incorrect protein start. The proper ATG codon start is further >downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly >predict in this case (see the attached screenshot) > >Any comments on this? > >Thanks! > >ernesto > >-- >Developer > >VectorBase | Ensembl Genomes > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Mar 25 08:24:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Mar 2014 08:24:36 -0600 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: --> /opt/maker/bin/maker It looks like most preinstalled software is under /opt on the image. Thanks, Carson From: Daniel Standage Date: Tuesday, March 25, 2014 at 7:24 AM To: Maker Mailing List Subject: [maker-devel] Maker iPlant image Greetings, I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Mar 25 10:33:59 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 25 Mar 2014 11:33:59 -0500 Subject: [maker-devel] maker to EvidenceModeler Message-ID: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> Hi Carson and others, Is there an easy tool/pipeline available as part of maker utilities to convert maker and SNAP output to files acceptable by EvidenceModeler? It looks like it also needs just gff files, but with a few tweaks. EvidenceModeler seems better equipped to handle PASA annotation results than maker results. Thanks Dhivya From barry.utah at gmail.com Tue Mar 25 11:51:38 2014 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 25 Mar 2014 11:51:38 -0600 Subject: [maker-devel] Problem extracting fasta from a GFF file generated with MAKER In-Reply-To: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> References: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com> Message-ID: Hi Diana, There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy. I just added a script to this library called gal_CDS_sequence which you would run like this: gal_CDS_sequence --translate genes.gff3 genome.fasta The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files. You can access the library here: http://www.sequenceontology.org/software/GAL.html Support for GAL is available via the SO mailing list: https://lists.sourceforge.net/lists/listinfo/song-devel Hope that helps, Barry On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote: > Hi there, > > We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included? > > Thanks! > > Diana > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at plantbiology.msu.edu Wed Mar 26 08:21:36 2014 From: kchilds at plantbiology.msu.edu (Childs, Kevin) Date: Wed, 26 Mar 2014 14:21:36 +0000 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: Daniel, There are a few small issues with the MAKER-P_2.28 image at iPlant. I have been using the image successfully for more than a month. I typically set several environmental variables immediately after starting an ssh session. export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/ export ZOE=/opt/maker/exe/snap export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config export TMP=/tmp The image will allow you to train SNAP, but training Augustus is not possible with the current image. Augustus training requires blat which was not installed in this image. There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking. I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image. Augustus should work with a stock hmm on this image. I have not attempted to use GeneMark, and of course, fgenesh is a completely different story. Kevin Childs --- Kevin Childs, PhD Assistant Professor - Fixed Term Plant Biology Department Michigan State University kchilds at plantbiology.msu.edu 517-775-2844 (m) 517-353-5969 (l) On Mar 25, 2014, at 10:24 AM, Carson Holt wrote: > --> /opt/maker/bin/maker > > It looks like most preinstalled software is under /opt on the image. > > Thanks, > Carson > > > From: Daniel Standage > Date: Tuesday, March 25, 2014 at 7:24 AM > To: Maker Mailing List > Subject: [maker-devel] Maker iPlant image > > Greetings, > > I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From steinj at cshl.edu Wed Mar 26 12:41:37 2014 From: steinj at cshl.edu (Stein, Joshua) Date: Wed, 26 Mar 2014 18:41:37 +0000 Subject: [maker-devel] Maker iPlant image In-Reply-To: References: Message-ID: Also please note that there is a tutorial available here, particularly important if you want to use in MPI mode. https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial Josh Joshua Stein, PhD Manager, Sci. Informatics III Cold Spring Harbor Laboratory steinj at cshl.edu http://ware.cshl.org/ On Mar 26, 2014, at 10:20 AM, "Childs, Kevin" wrote: > Daniel, > > There are a few small issues with the MAKER-P_2.28 image at iPlant. I have been using the image successfully for more than a month. I typically set several environmental variables immediately after starting an ssh session. > > export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/ > export ZOE=/opt/maker/exe/snap > export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config > export TMP=/tmp > > The image will allow you to train SNAP, but training Augustus is not possible with the current image. Augustus training requires blat which was not installed in this image. There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking. I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image. > > Augustus should work with a stock hmm on this image. > > I have not attempted to use GeneMark, and of course, fgenesh is a completely different story. > > Kevin Childs > > > --- > Kevin Childs, PhD > > Assistant Professor - Fixed Term > Plant Biology Department > Michigan State University > > kchilds at plantbiology.msu.edu > 517-775-2844 (m) > 517-353-5969 (l) > > On Mar 25, 2014, at 10:24 AM, Carson Holt wrote: > >> --> /opt/maker/bin/maker >> >> It looks like most preinstalled software is under /opt on the image. >> >> Thanks, >> Carson >> >> >> From: Daniel Standage >> Date: Tuesday, March 25, 2014 at 7:24 AM >> To: Maker Mailing List >> Subject: [maker-devel] Maker iPlant image >> >> Greetings, >> >> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks. >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Sat Mar 29 10:24:05 2014 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Sat, 29 Mar 2014 11:24:05 -0500 Subject: [maker-devel] Missing UTRs in GFF Message-ID: I have annotated a eukaryotic genome with MAKER 2.30. I recently realized that there are a few genes in the GFF file produced by gff3_merge with inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs have their own lines in the GFF file. However, for the problematic genes, the UTRs are not specified in the GFF file and all exons are annotated as CDS. The UTRs do appear in the gene header and the protein sequences are the correct length (do not include the UTR). I have attached an example from the GFF file. Is this a known problem, or have I done something wrong? Is there an easy way to fix the GFF file? Thanks for your help, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: missing_utr.gff Type: application/octet-stream Size: 2934 bytes Desc: not available URL: From mhinsley at ebi.ac.uk Mon Mar 31 04:20:10 2014 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Mon, 31 Mar 2014 11:20:10 +0100 Subject: [maker-devel] putative preponderance of short exons?? Message-ID: <5339415A.1020509@ebi.ac.uk> Hi I've run Maker on a de novo assembly of a species of fly and then ran some simple statistics (intron/ exon/ CDS length, exons per gene) over the GFF output and compared with a couple of other species. It all looks good except that there is a surprising number of very short exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' exons removed). I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP. I'm using maker 2.31 (unpatched). Anecdotally, these short exons appear without EST or protein evidence and they all line up with canonical splice sequences (GT----AG). (but i've only looked at a few using Apollo). While there's no requirement that exons should be longer I'm suspicious of this as there must be some evolutionary relationship between these species. I've compared with a another species annotated with Maker (using SNAP and Augustus) which is more distant (not yet publicly available), and the same pattern of short exons is present. I wondered if they were created to fulfil the need for start/stop codons, but this does not appear to be the case (mostly they are mid-gene). Is there some way to adjust the predictors eg to require external evidence? or anything else you could suggest? ... I can see the following in the tutorial but I'm not sure how they could help: pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no thanks -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom -------------- next part -------------- A non-text attachment was scrubbed... Name: exon_53.pdf Type: application/pdf Size: 10619 bytes Desc: not available URL: From carsonhh at gmail.com Mon Mar 31 07:52:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 31 Mar 2014 07:52:15 -0600 Subject: [maker-devel] putative preponderance of short exons?? In-Reply-To: <5339415A.1020509@ebi.ac.uk> References: <5339415A.1020509@ebi.ac.uk> Message-ID: The intron/exon structure is determined by SNAP, Augustus, etc. It is not affected by any of the maker parameters. Only evidence alignments are affected by the maker settings. You can try retraining or manually editing the HMMs, but they might also be regions where your assembly is incorrect and those algorithms make short exons in order to make a structure work without getting stop codons mid gene. Thanks, Carson On 3/31/14, 4:20 AM, "Malcolm Hinsley" wrote: >Hi > >I've run Maker on a de novo assembly of a species of fly and then ran >some simple statistics (intron/ exon/ CDS length, exons per gene) over >the GFF output and compared with a couple of other species. >It all looks good except that there is a surprising number of very short >exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached >pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' >exons removed). > >I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP. >I'm using maker 2.31 (unpatched). > >Anecdotally, these short exons appear without EST or protein evidence >and they all line up with canonical splice sequences (GT----AG). >(but i've only looked at a few using Apollo). > >While there's no requirement that exons should be longer I'm suspicious >of this as there must be some evolutionary relationship between these >species. >I've compared with a another species annotated with Maker (using SNAP >and Augustus) which is more distant (not yet publicly available), and >the same pattern of short exons is present. >I wondered if they were created to fulfil the need for start/stop >codons, but this does not appear to be the case (mostly they are >mid-gene). > > >Is there some way to adjust the predictors eg to require external >evidence? or anything else you could suggest? ... I can see the >following in the tutorial but I'm not sure how they could help: > >pred_flank=200 #flank for extending evidence clusters sent to gene >predictors >pred_stats=0 #report AED and QI statistics for all predictions as well as >models >AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and >1) >min_protein=0 #require at least this many amino acids in predicted >proteins >alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = >yes, 0 = no >always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 >= no > > >thanks > >-- >malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 >European Bioinformatics Institute (EMBL-EBI) >European Molecular Biology Laboratory >Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD >United Kingdom > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Mar 31 08:37:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 31 Mar 2014 08:37:15 -0600 Subject: [maker-devel] Missing UTRs in GFF In-Reply-To: References: Message-ID: Not something I've seen before, but there was a patch for another issue that was cause by the use of avoid_est_fusion=1, that may be related. Try the current stable release 2.31, and let me know if it still happens. You can also upload the contig folder from one of the regions in question here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Then I could verify the bug, and see if it is something that happens in the current release. --Carson From: Benjamin Rubin Date: Saturday, March 29, 2014 at 10:24 AM To: Subject: [maker-devel] Missing UTRs in GFF I have annotated a eukaryotic genome with MAKER 2.30. I recently realized that there are a few genes in the GFF file produced by gff3_merge with inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs have their own lines in the GFF file. However, for the problematic genes, the UTRs are not specified in the GFF file and all exons are annotated as CDS. The UTRs do appear in the gene header and the protein sequences are the correct length (do not include the UTR). I have attached an example from the GFF file. Is this a known problem, or have I done something wrong? Is there an easy way to fix the GFF file? Thanks for your help, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: