From matthew.macmanes at unh.edu Tue Apr 1 06:23:59 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Tue, 1 Apr 2014 07:23:59 -0400 Subject: [maker-devel] Installing Maker on Cray Message-ID: Hello, I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. Any advice? Thanks, Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at icloud.com Tue Apr 1 07:58:35 2014 From: carson.holt at icloud.com (Carson Holt) Date: Tue, 01 Apr 2014 06:58:35 -0600 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: References: Message-ID: Create a soft link called mpicc. I can't guarantee shared libraries are installed on you system though as not all system derived versions of MPICH2 have been configured with shared libraries. --Carson Sent from my iPhone > On Apr 1, 2014, at 5:23 AM, Matthew MacManes wrote: > > Hello, > > I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page > > Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC > > When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. > > Any advice? > > Thanks, Matt > > __________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.macmanes at unh.edu Tue Apr 1 11:11:55 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Tue, 1 Apr 2014 12:11:55 -0400 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> References: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> Message-ID: Hi Carson and list: I tried that - we'll see if it works. I'm hung up on Perl dependencies right now - the Craycc compiler is not happy with several of them (forks, to name one). If anybody has installed Maker on a Cray, please contact me! Thanks, Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Tue, Apr 1, 2014 at 8:58 AM, Carson Holt wrote: > Create a soft link called mpicc. I can't guarantee shared libraries are > installed on you system though as not all system derived versions of MPICH2 > have been configured with shared libraries. > > --Carson > > > > Sent from my iPhone > > On Apr 1, 2014, at 5:23 AM, Matthew MacManes > wrote: > > Hello, > > I am trying to install the MPI version of Maker on our Cray > supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page > > Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's > own proprietary compilers mpicc=cc and mpicxx=CC > > When running the 1st step in src 'perl Build.pl', it asks me for the > location of mpicc - I can give the full path to Cray equivalent cc, but it > is not recognized. Many other programs allow me to specify the c compiler, > e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. > > Any advice? > > Thanks, Matt > > __________________________________ > *Matthew MacManes*, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Apr 1 11:29:40 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Apr 2014 16:29:40 +0000 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: References: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> Message-ID: <350474CE-B7EB-4EFF-9C8B-AD71FBB81CA3@illinois.edu> We might be interested in that ourselves at some point: https://bluewaters.ncsa.illinois.edu chris On Apr 1, 2014, at 11:11 AM, Matthew MacManes > wrote: Hi Carson and list: I tried that - we'll see if it works. I'm hung up on Perl dependencies right now - the Craycc compiler is not happy with several of them (forks, to name one). If anybody has installed Maker on a Cray, please contact me! Thanks, Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Tue, Apr 1, 2014 at 8:58 AM, Carson Holt > wrote: Create a soft link called mpicc. I can't guarantee shared libraries are installed on you system though as not all system derived versions of MPICH2 have been configured with shared libraries. --Carson Sent from my iPhone On Apr 1, 2014, at 5:23 AM, Matthew MacManes > wrote: Hello, I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. Any advice? Thanks, Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at bioperl.org Tue Apr 1 13:39:14 2014 From: jason at bioperl.org (Jason Stajich) Date: Tue, 1 Apr 2014 11:39:14 -0700 Subject: [maker-devel] maker to EvidenceModeler In-Reply-To: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> References: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> Message-ID: I've used this script I wrote to make the necessary input files from maker GFF3. https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/maker2evm.pl Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Tue, Mar 25, 2014 at 9:33 AM, dhivya arasappan wrote: > Hi Carson and others, > > Is there an easy tool/pipeline available as part of maker utilities to > convert maker and SNAP output to files acceptable by EvidenceModeler? > > It looks like it also needs just gff files, but with a few tweaks. > EvidenceModeler seems better equipped to handle PASA annotation results > than maker results. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 1 13:36:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 01 Apr 2014 12:36:44 -0600 Subject: [maker-devel] Missing UTRs in GFF In-Reply-To: References: Message-ID: It was indeed caused by the correct_est_fusion=1 option (which is supposed to trim off UTR if it appears overlap of UTR across genes is caused by merged mRNAseq). I have attached a patch that is used to replace .../maker/lib/maker/auto_annotator.pm, and I've updated the website download to include the patch as in MAKER download version 2.31.3. Thanks, Carson From: Benjamin Rubin Date: Tuesday, April 1, 2014 at 9:21 AM To: Carson Holt Subject: Re: [maker-devel] Missing UTRs in GFF OK, I think I uploaded everything. I included a cleaned up version of the control file without all of my paths in case that is useful. Thanks, Ben On Tue, Apr 1, 2014 at 9:50 AM, Carson Holt wrote: > Could upload your input fasta and hmm files as well. Sometimes I can > reproduce errors using just the raw reports, but it looks like I will need the > input files. > > --Carson > > > From: Benjamin Rubin > Date: Tuesday, April 1, 2014 at 8:38 AM > To: Carson Holt > Subject: Re: [maker-devel] Missing UTRs in GFF > > Hi Carson, > > I tried using version 2.31 on a scaffold where this problem occurred with 2.30 > and got the same result, unfortunately. I did use corr_est_fusion=1 both times > so this might be related. I have uploaded the sequence for this scaffold and > the output directory under username "brubin". Is this the data that you meant? > > I am also reattaching information on a representative problem gene from this > scaffold that occurs at base 1330779. > > Thanks so much for the help, > Ben > > > On Mon, Mar 31, 2014 at 9:37 AM, Carson Holt wrote: >> Not something I've seen before, but there was a patch for another issue that >> was cause by the use of avoid_est_fusion=1, that may be related. Try the >> current stable release 2.31, and let me know if it still happens. >> >> You can also upload the contig folder from one of the regions in question >> here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> Then I could verify the bug, and see if it is something that happens in the >> current release. >> >> --Carson >> >> >> From: Benjamin Rubin >> Date: Saturday, March 29, 2014 at 10:24 AM >> To: >> Subject: [maker-devel] Missing UTRs in GFF >> >> I have annotated a eukaryotic genome with MAKER 2.30. I recently realized >> that there are a few genes in the GFF file produced by gff3_merge with >> inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs >> have their own lines in the GFF file. However, for the problematic genes, the >> UTRs are not specified in the GFF file and all exons are annotated as CDS. >> The UTRs do appear in the gene header and the protein sequences are the >> correct length (do not include the UTR). I have attached an example from the >> GFF file. >> >> Is this a known problem, or have I done something wrong? Is there an easy way >> to fix the GFF file? >> >> Thanks for your help, >> Ben >> >> -- >> _____________________________________________________ >> Benjamin ER Rubin >> PhD Candidate >> Committee on Evolutionary Biology >> University of Chicago >> benrubin.org >> >> Division of Insects >> Zoology Department >> Field Museum of Natural History >> 1400 South Lake Shore Drive >> Chicago, IL 60605 >> USA >> Office: (312) 665-7776 >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > benrubin.org > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: auto_annotator.pm Type: text/x-perl-script Size: 101567 bytes Desc: not available URL: From amelia.ireland at gmod.org Thu Apr 3 16:10:53 2014 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Thu, 3 Apr 2014 14:10:53 -0700 Subject: [maker-devel] GMOD Online Training 2014 Message-ID: Greetings GMOD community! Applications are now open for the 2014 GMOD online training course, to be held from May 19th - 23rd 2014. The course will cover the installation, configuration, and usage of core GMOD software, including GBrowse and JBrowse, Galaxy, MAKER, Tripal, WebApollo, Canto, and the Chado database. The course is taught by experienced instructors and developers with deep knowledge of the tools. Although the course will be run online, students will be able to interact with the tutors and fellow attendees, ask questions, and so on. For more information and to apply, please see http://gmod.org/wiki/GMOD_Online_Training_2014 If you have any questions, please contact the GMOD help desk at help at gmod.org. Thanks! -- Amelia Ireland GMOD Community Support Generic Model Organism Database project http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From Brian.Mack at ARS.USDA.GOV Mon Apr 7 07:55:01 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Mon, 7 Apr 2014 12:55:01 +0000 Subject: [maker-devel] maker_functional_gff Message-ID: Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular "-outfmt 6" option against the uniprot uniref-50. I put these results in the maker_functional_gff program using "maker_functional_gff uniref-50 blastp-output maker.gff" but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I'm doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 7 09:58:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 08:58:20 -0600 Subject: [maker-devel] maker_functional_gff Message-ID: maker_functional_gff works with UniProt/Swiss-Prot. The uniref-50 headers are different. The script looks for the OS= GN= and PE= tags. You might be able to coerce it into working on the UniRef header by changing Tax= to OS=, RepID= to GN= and then adding a PE= to the end of the header as just a placeholder. --Carson From: "Mack, Brian" Date: Monday, April 7, 2014 at 6:55 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] maker_functional_gff Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular ?-outfmt 6? option against the uniprot uniref-50. I put these results in the maker_functional_gff program using ?maker_functional_gff uniref-50 blastp-output maker.gff? but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I?m doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 7 10:02:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 09:02:55 -0600 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: I added a line to look for the UniRef header format in the attached scripts. Go ahead and give it a try. --Carson From: Carson Holt Date: Monday, April 7, 2014 at 8:58 AM To: "Mack, Brian" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker_functional_gff maker_functional_gff works with UniProt/Swiss-Prot. The uniref-50 headers are different. The script looks for the OS= GN= and PE= tags. You might be able to coerce it into working on the UniRef header by changing Tax= to OS=, RepID= to GN= and then adding a PE= to the end of the header as just a placeholder. --Carson From: "Mack, Brian" Date: Monday, April 7, 2014 at 6:55 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] maker_functional_gff Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular ?-outfmt 6? option against the uniprot uniref-50. I put these results in the maker_functional_gff program using ?maker_functional_gff uniref-50 blastp-output maker.gff? but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I?m doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_functional_fasta Type: application/octet-stream Size: 3451 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_functional_gff Type: application/octet-stream Size: 4102 bytes Desc: not available URL: From darasappan at gmail.com Mon Apr 7 10:57:08 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 7 Apr 2014 10:57:08 -0500 Subject: [maker-devel] keep_preds parameter Message-ID: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Hello, I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? Thanks dhivya From dence at genetics.utah.edu Mon Apr 7 11:06:15 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 7 Apr 2014 16:06:15 +0000 Subject: [maker-devel] keep_preds parameter In-Reply-To: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: Hi Dhivya, That's a correct understanding of keep_preds, and it is a binary parameter; you either tell MAKER to keep the unsupported predictions or not to keep the unsupported predictions. In the output, you can tell which genes are supported by the _AED attribute in the gff3 file. Genes with and AED equal to zero have no support from the evidence sets (protein and EST and alt_EST). ~Daniel On Apr 7, 2014, at 9:57 AM, dhivya arasappan wrote: > Hello, > > I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? > > Thanks > dhivya > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From darasappan at gmail.com Mon Apr 7 11:31:55 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 7 Apr 2014 11:31:55 -0500 Subject: [maker-devel] keep_preds parameter In-Reply-To: References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: Thank you Daniel. But I thought an AED score of zero indicates complete agreement of annotation to evidence and that 1 would mean no agreement? Dhivya On Apr 7, 2014, at 11:06 AM, Daniel Ence wrote: > Hi Dhivya, > > That's a correct understanding of keep_preds, and it is a binary parameter; you either tell MAKER to keep the unsupported predictions or not to keep the unsupported predictions. In the output, you can tell which genes are supported by the _AED attribute in the gff3 file. Genes with and AED equal to zero have no support from the evidence sets (protein and EST and alt_EST). > > ~Daniel > On Apr 7, 2014, at 9:57 AM, dhivya arasappan > wrote: > >> Hello, >> >> I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? >> >> Thanks >> dhivya >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Mon Apr 7 11:33:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 10:33:59 -0600 Subject: [maker-devel] keep_preds parameter In-Reply-To: References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: True. Daniel had the numbers backwards (I often accidentally do that as well). --Carson On 4/7/14, 10:31 AM, "dhivya arasappan" wrote: >Thank you Daniel. But I thought an AED score of zero indicates complete >agreement of annotation to evidence and that 1 would mean no agreement? > >Dhivya > >On Apr 7, 2014, at 11:06 AM, Daniel Ence wrote: > >> Hi Dhivya, >> >> That's a correct understanding of keep_preds, and it is a binary >>parameter; you either tell MAKER to keep the unsupported predictions or >>not to keep the unsupported predictions. In the output, you can tell >>which genes are supported by the _AED attribute in the gff3 file. Genes >>with and AED equal to zero have no support from the evidence sets >>(protein and EST and alt_EST). >> >> ~Daniel >> On Apr 7, 2014, at 9:57 AM, dhivya arasappan >> wrote: >> >>> Hello, >>> >>> I?m looking for a little more explanation about keep_preds parameter. >>>The documentation says that it is a threshold to add unsupported gene >>>predictions. Along with some other changes, I set keep_preds=1 and saw >>>a huge jump in the number of genes I was getting. Is setting this >>>parameter to 1 equivalent to saying, include all predicted genes in my >>>output, even if they are not supported by my set or protein data? Is >>>there a way to tell from my output which genes are unsupported and >>>which are not? Also, are the only two options for this parameter 0 and >>>1? >>> >>> Thanks >>> dhivya >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nextgen.usfs at gmail.com Mon Apr 7 17:34:32 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Mon, 7 Apr 2014 17:34:32 -0500 Subject: [maker-devel] fasta_merge ARRAY error Message-ID: Hello, I?m getting an error when running fasta_merge as follows: Can't use an undefined value as an ARRAY reference at /home/ngs/maker/bin/fasta_merge line 116, line 1942. The result is that the fasta files are somewhat truncated, that is they do not match the gff3 file created from gff3_merge (which does run without any errors). Seems like it is getting stuck somewhere and then crashes. Is there another way to easily get the CDS out of the maker generated GFF file? Thanks, Jon From dence at genetics.utah.edu Mon Apr 7 20:23:07 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 8 Apr 2014 01:23:07 +0000 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: Hi Jon, Will you please send the command that gave you that error? Also, will you upload the maker control files you used and the gff3 file to the URL below? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 Also, which version of MAKER are you using? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS Ion PGM [nextgen.usfs at gmail.com] Sent: Monday, April 07, 2014 4:34 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] fasta_merge ARRAY error Hello, I?m getting an error when running fasta_merge as follows: Can't use an undefined value as an ARRAY reference at /home/ngs/maker/bin/fasta_merge line 116, line 1942. The result is that the fasta files are somewhat truncated, that is they do not match the gff3 file created from gff3_merge (which does run without any errors). Seems like it is getting stuck somewhere and then crashes. Is there another way to easily get the CDS out of the maker generated GFF file? Thanks, Jon _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Apr 7 21:02:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 20:02:30 -0600 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: What version of MAKER are you using, and did you run with the new trnascan option turned on? Basically the script is finding a fasta file for transcripts but the file for proteins is missing. Turning trnascan on can do this (obviously tRNAs can encode transcripts but don't encode proteins). The version of fasta_merge included in the current MAKER 2.31.3 download should handle this correctly. --Carson On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >Hi Jon, Will you please send the command that gave you that error? Also, >will you upload the maker control files you used and the gff3 file to the >URL below? > >http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 > >Also, which version of MAKER are you using? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS >Ion PGM [nextgen.usfs at gmail.com] >Sent: Monday, April 07, 2014 4:34 PM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] fasta_merge ARRAY error > >Hello, > >I?m getting an error when running fasta_merge as follows: > >Can't use an undefined value as an ARRAY reference at >/home/ngs/maker/bin/fasta_merge line 116, line 1942. > >The result is that the fasta files are somewhat truncated, that is they >do not match the gff3 file created from gff3_merge (which does run >without any errors). Seems like it is getting stuck somewhere and then >crashes. Is there another way to easily get the CDS out of the maker >generated GFF file? > >Thanks, > >Jon > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nextgen.usfs at gmail.com Tue Apr 8 07:56:22 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Tue, 8 Apr 2014 07:56:22 -0500 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Hi Carson and Daniel, I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so perhaps I should just get fasta_merge from 2.31.3 and give it a shot. But first to clarify, fasta_merge -d maker1_master_datastore_index.log - returns the appropriate files, however both the maker.all.proteins.fasta and maker.all.transcripts.fasta return 7401 with a grep command counting ?>?, while the gff3_merge -d maker1_master_datastore_index.log runs without failure and a grep command counting ?gene? returns 7525 models. I uploaded the files requested below. Thanks for the help. -Jon On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: > What version of MAKER are you using, and did you run with the new trnascan > option turned on? Basically the script is finding a fasta file for > transcripts but the file for proteins is missing. Turning trnascan on can > do this (obviously tRNAs can encode transcripts but don't encode > proteins). The version of fasta_merge included in the current MAKER > 2.31.3 download should handle this correctly. > > --Carson > > > > On 4/7/14, 7:23 PM, "Daniel Ence" wrote: > >> Hi Jon, Will you please send the command that gave you that error? Also, >> will you upload the maker control files you used and the gff3 file to the >> URL below? >> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >> >> Also, which version of MAKER are you using? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS >> Ion PGM [nextgen.usfs at gmail.com] >> Sent: Monday, April 07, 2014 4:34 PM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] fasta_merge ARRAY error >> >> Hello, >> >> I?m getting an error when running fasta_merge as follows: >> >> Can't use an undefined value as an ARRAY reference at >> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >> >> The result is that the fasta files are somewhat truncated, that is they >> do not match the gff3 file created from gff3_merge (which does run >> without any errors). Seems like it is getting stuck somewhere and then >> crashes. Is there another way to easily get the CDS out of the maker >> generated GFF file? >> >> Thanks, >> >> Jon >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Tue Apr 8 09:54:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 08 Apr 2014 08:54:05 -0600 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> References: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Message-ID: I've attached the fixed version (I see that the patched one is not in 2.31.3, but I'll get that taken care of). The tRNA genes will be in the maker.trnascan.transcripts.fasta. The other files will have only the coding genes. --Carson On 4/8/14, 6:56 AM, "USFS Ion PGM" wrote: >Hi Carson and Daniel, >I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so >perhaps I should just get fasta_merge from 2.31.3 and give it a shot. >But first to clarify, fasta_merge -d maker1_master_datastore_index.log - >returns the appropriate files, however both the maker.all.proteins.fasta >and maker.all.transcripts.fasta return 7401 with a grep command counting >?>?, while the gff3_merge -d maker1_master_datastore_index.log runs >without failure and a grep command counting ?gene? returns 7525 models. > >I uploaded the files requested below. Thanks for the help. > >-Jon > > >On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: > >> What version of MAKER are you using, and did you run with the new >>trnascan >> option turned on? Basically the script is finding a fasta file for >> transcripts but the file for proteins is missing. Turning trnascan on >>can >> do this (obviously tRNAs can encode transcripts but don't encode >> proteins). The version of fasta_merge included in the current MAKER >> 2.31.3 download should handle this correctly. >> >> --Carson >> >> >> >> On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >> >>> Hi Jon, Will you please send the command that gave you that error? >>>Also, >>> will you upload the maker control files you used and the gff3 file to >>>the >>> URL below? >>> >>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >>> >>> Also, which version of MAKER are you using? >>> >>> Thanks, >>> Daniel >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>USFS >>> Ion PGM [nextgen.usfs at gmail.com] >>> Sent: Monday, April 07, 2014 4:34 PM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] fasta_merge ARRAY error >>> >>> Hello, >>> >>> I?m getting an error when running fasta_merge as follows: >>> >>> Can't use an undefined value as an ARRAY reference at >>> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >>> >>> The result is that the fasta files are somewhat truncated, that is they >>> do not match the gff3 file created from gff3_merge (which does run >>> without any errors). Seems like it is getting stuck somewhere and then >>> crashes. Is there another way to easily get the CDS out of the maker >>> generated GFF file? >>> >>> Thanks, >>> >>> Jon >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: fasta_merge Type: application/octet-stream Size: 2977 bytes Desc: not available URL: From nextgen.usfs at gmail.com Tue Apr 8 11:01:18 2014 From: nextgen.usfs at gmail.com (Jon Palmer) Date: Tue, 08 Apr 2014 11:01:18 -0500 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Message-ID: <53441D4E.2070502@gmail.com> Thanks Carson, error is gone and is now working. Thanks for a great tool and for the fantastic support! -Jon On 04/08/2014 09:54 AM, Carson Holt wrote: > I've attached the fixed version (I see that the patched one is not in > 2.31.3, but I'll get that taken care of). > > The tRNA genes will be in the maker.trnascan.transcripts.fasta. The other > files will have only the coding genes. > > --Carson > > > > On 4/8/14, 6:56 AM, "USFS Ion PGM" wrote: > >> Hi Carson and Daniel, >> I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so >> perhaps I should just get fasta_merge from 2.31.3 and give it a shot. >> But first to clarify, fasta_merge -d maker1_master_datastore_index.log - >> returns the appropriate files, however both the maker.all.proteins.fasta >> and maker.all.transcripts.fasta return 7401 with a grep command counting >> ?>?, while the gff3_merge -d maker1_master_datastore_index.log runs >> without failure and a grep command counting ?gene? returns 7525 models. >> >> I uploaded the files requested below. Thanks for the help. >> >> -Jon >> >> >> On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: >> >>> What version of MAKER are you using, and did you run with the new >>> trnascan >>> option turned on? Basically the script is finding a fasta file for >>> transcripts but the file for proteins is missing. Turning trnascan on >>> can >>> do this (obviously tRNAs can encode transcripts but don't encode >>> proteins). The version of fasta_merge included in the current MAKER >>> 2.31.3 download should handle this correctly. >>> >>> --Carson >>> >>> >>> >>> On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >>> >>>> Hi Jon, Will you please send the command that gave you that error? >>>> Also, >>>> will you upload the maker control files you used and the gff3 file to >>>> the >>>> URL below? >>>> >>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >>>> >>>> Also, which version of MAKER are you using? >>>> >>>> Thanks, >>>> Daniel >>>> >>>> >>>> Daniel Ence >>>> Graduate Student >>>> Eccles Institute of Human Genetics >>>> University of Utah >>>> 15 North 2030 East, Room 2100 >>>> Salt Lake City, UT 84112-5330 >>>> ________________________________________ >>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>> USFS >>>> Ion PGM [nextgen.usfs at gmail.com] >>>> Sent: Monday, April 07, 2014 4:34 PM >>>> To: maker-devel at yandell-lab.org >>>> Subject: [maker-devel] fasta_merge ARRAY error >>>> >>>> Hello, >>>> >>>> I?m getting an error when running fasta_merge as follows: >>>> >>>> Can't use an undefined value as an ARRAY reference at >>>> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >>>> >>>> The result is that the fasta files are somewhat truncated, that is they >>>> do not match the gff3 file created from gff3_merge (which does run >>>> without any errors). Seems like it is getting stuck somewhere and then >>>> crashes. Is there another way to easily get the CDS out of the maker >>>> generated GFF file? >>>> >>>> Thanks, >>>> >>>> Jon >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> From sjackman at gmail.com Tue Apr 8 14:21:38 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 8 Apr 2014 12:21:38 -0700 Subject: [maker-devel] Changing rmlib runs RepeatRunner Message-ID: Changing `rmlib` causes not just RepeatMasker to be rerun, but also RepeatRunner. Is the latter necessary? Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 8 15:00:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 08 Apr 2014 14:00:11 -0600 Subject: [maker-devel] Changing rmlib runs RepeatRunner In-Reply-To: References: Message-ID: RepeatRunner runs on what was not masked by RepeatMasker, so changing rmlib can cause RepeatRunner to give slightly different results because RepeatMasker results changed. --Carson From: Shaun Jackman Date: Tuesday, April 8, 2014 at 1:21 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Changing rmlib runs RepeatRunner Changing `rmlib` causes not just RepeatMasker to be rerun, but also RepeatRunner. Is the latter necessary? Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 13:34:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 11:34:34 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER Message-ID: The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA and CDS features, but it does not include gene or exon features, and so it does not appear to be working with MAKER. Has anyone else used GlimmerHMM with MAKER, and how did you deal with this issue? Cheers, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 13:53:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 12:53:55 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match for mRNA and match_part for CDS. Then it will be interpreted as a two level alignments feature which can be given to pred_gff. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 12:34 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Using GlimmerHMM with MAKER The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA and CDS features, but it does not include gene or exon features, and so it does not appear to be working with MAKER. Has anyone else used GlimmerHMM with MAKER, and how did you deal with this issue? Cheers, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 16:32:55 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 14:32:55 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: Thanks, Carson. That helps. I'm trying to do a completely ab initio gene annotation without any est or protein homology evidence, at least for now. The GFF file produce by maker is empty. How do I carry the GlimmerHMM pred_gff (or model_gff) annotations through to the end? Ultimately, I'd like to merge annotations from multiple ab initio predictions. Cheers, Shaun On 10 April 2014 11:53, Carson Holt wrote: > Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match > for mRNA and match_part for CDS. Then it will be interpreted as a two > level alignments feature which can be given to pred_gff. > > --Carson > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 12:34 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Using GlimmerHMM with MAKER > > The GlimmerHMM gene prediction software outputs a GFF file that includes > mRNA and CDS features, but it does not include gene or exon features, and > so it does not appear to be working with MAKER. Has anyone else used > GlimmerHMM with MAKER, and how did you deal with this issue? > > Cheers, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 16:35:17 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 15:35:17 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: keep_preds=1 will force MAKER to keep ab initio results even if their is no evidence supporting them. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 3:32 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Using GlimmerHMM with MAKER Thanks, Carson. That helps. I'm trying to do a completely ab initio gene annotation without any est or protein homology evidence, at least for now. The GFF file produce by maker is empty. How do I carry the GlimmerHMM pred_gff (or model_gff) annotations through to the end? Ultimately, I'd like to merge annotations from multiple ab initio predictions. Cheers, Shaun On 10 April 2014 11:53, Carson Holt wrote: > Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match for > mRNA and match_part for CDS. Then it will be interpreted as a two level > alignments feature which can be given to pred_gff. > > --Carson > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 12:34 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Using GlimmerHMM with MAKER > > The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA > and CDS features, but it does not include gene or exon features, and so it > does not appear to be working with MAKER. Has anyone else used GlimmerHMM with > MAKER, and how did you deal with this issue? > > Cheers, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 17:51:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 15:51:34 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: That worked! Thanks again, Carson. A note for the record: I found that keep_preds=1 carries forward pred_gffannotations, but not model_gff annotations when that GFF file uses match and match_partannotations (like a munged GlimmerHMM GFF file), which makes sense I guess now that I think about it. Cheers, Shaun On 10 April 2014 14:35, Carson Holt wrote: > keep_preds=1 will force MAKER to keep ab initio results even if their is > no evidence supporting them. > > --Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 3:32 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Using GlimmerHMM with MAKER > > Thanks, Carson. That helps. I'm trying to do a completely ab initio gene > annotation without any est or protein homology evidence, at least for now. > The GFF file produce by maker is empty. How do I carry the GlimmerHMM > pred_gff (or model_gff) annotations through to the end? Ultimately, I'd > like to merge annotations from multiple ab initio predictions. > > Cheers, > Shaun > > > On 10 April 2014 11:53, Carson Holt wrote: > >> Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute >> match for mRNA and match_part for CDS. Then it will be interpreted as a >> two level alignments feature which can be given to pred_gff. >> >> --Carson >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Thursday, April 10, 2014 at 12:34 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] Using GlimmerHMM with MAKER >> >> The GlimmerHMM gene prediction software outputs a GFF file that includes >> mRNA and CDS features, but it does not include gene or exon features, and >> so it does not appear to be working with MAKER. Has anyone else used >> GlimmerHMM with MAKER, and how did you deal with this issue? >> >> Cheers, >> Shaun >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 17:55:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 16:55:07 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: The model_gff option can only take gene/mRNA/exon/CDS features, and will ignore match/match_part features. It's a little more restrictive. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 4:51 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Using GlimmerHMM with MAKER model_gff -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbharris at uw.edu Mon Apr 14 20:45:13 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Mon, 14 Apr 2014 18:45:13 -0700 Subject: [maker-devel] empty genome.ann/genome.dna Message-ID: Hi, I recently set up MAKER on a new computer and am having trouble running a dataset that was run successfully on a different computer. After MAKER is finished, I ran gff3_merge and maker2zff and it returns empty genome.ann and genome.dna files. I have tried installing older versions of dependencies and have tinkered with the control files but I still can't figure out what the issue is. The only difference I can find is that the .all.gff file from a successfully run file has lines at the beginning of the file reporting the success of exonerate. On the failing version of maker - these are not reported - it just goes strait to fasta output. However, exonerate appears to work successfully when run outside of the maker pipeline. Any help would be greatly appreciated. Thanks! Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 15 10:33:45 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Apr 2014 09:33:45 -0600 Subject: [maker-devel] empty genome.ann/genome.dna In-Reply-To: References: Message-ID: Could you upload your control files and job input files here--> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi I'll take a look to see if there is any problem with your job's setup. Also what version of MAKER are you running? --Carson From: Rebecca Harris Date: Monday, April 14, 2014 at 7:45 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] empty genome.ann/genome.dna Hi, I recently set up MAKER on a new computer and am having trouble running a dataset that was run successfully on a different computer. After MAKER is finished, I ran gff3_merge and maker2zff and it returns empty genome.ann and genome.dna files. I have tried installing older versions of dependencies and have tinkered with the control files but I still can't figure out what the issue is. The only difference I can find is that the .all.gff file from a successfully run file has lines at the beginning of the file reporting the success of exonerate. On the failing version of maker - these are not reported - it just goes strait to fasta output. However, exonerate appears to work successfully when run outside of the maker pipeline. Any help would be greatly appreciated. Thanks! Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Tue Apr 15 12:01:37 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Tue, 15 Apr 2014 13:01:37 -0400 Subject: [maker-devel] passing names from a gff to new predictions Message-ID: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Hello I have an interesting issue with an existing Maker gff. I have a gff file with human friendly names that I would like to pass to the new predictions. However, some of those genes in the human friendly gff file are incorrect or have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 the names move but so do the incorrect models. Maker simply duplicates these predictions to the new outputs. If I remove the GFF file from the ctl file I get new predictions, that have the necessary corrections but they now have unfriendly names. Do you have any suggestions on how to associate the old names with the new predictions? I could simple blast the old proteins vs the new ones and associate them in that manor but I was wondering if there were any other options within Maker. Since I have the GFF files I also have the associated transcripts and proteins. Do I need to do some iteration of est2/genome then generate a new model gff file? The issue we are dealing with is thousands of short introns in our gff file. These are less than 20 bp and are not biologically feasible so we are trying to correct the gene model predictions. Cheers Ian From carsonhh at gmail.com Tue Apr 15 12:31:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Apr 2014 11:31:35 -0600 Subject: [maker-devel] passing names from a gff to new predictions In-Reply-To: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> References: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Message-ID: If you give anything to pred_gff or model_gff then it is allowed to compete as a predictor and thus can end up in the final results. You stated that the models you are passing in have errors, and you don't want them to be allowed to compete and end up in your final models? Correct. MAKER is not made to expect erroneous input, so I don't have an easy solution for you (I do have a less easy solution though; but you will need to do some editing of the MAKER code). 1. Open .../maker/lib/maker/auto_annotator.pm in an editor like emacs or vi. 2. Search for the 'best_annotations' subroutine (around line 1248 depending on which version of MAKER you have). 3. Then edit it as follows: This is how the top section of the subroutine should look at first --> sub best_annotations { my $annotations = shift; my $CTL_OPT = shift; my @predictors = @{$CTL_OPT->{_predictor}}; ... Change it to this --> sub best_annotations { my $annotations = shift; my $CTL_OPT = shift; my @predictors = grep {!/model_gff/} @{$CTL_OPT->{_predictor}}; ... Now run maker again with your old GFF3 file as input to model_gff, and just remember to change the MAKER code back to the way it was when your done with everything. Basically the change will hard filter model_gff results from being allowed into your final annotations. So names will still move from model_gff to your final results with the map_forward=1 option but none of the old models will make it as gene/mRNA/exon/CDS features in the final GFF3 (they will still be listed as match/match_part reference features though). Thanks, Carson On 4/15/14, 11:01 AM, "UMD Bioinformatics" wrote: > Hello > > I have an interesting issue with an existing Maker gff. I have a gff file with > human friendly names that I would like to pass to the new predictions. > However, some of those genes in the human friendly gff file are incorrect or > have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 > the names move but so do the incorrect models. Maker simply duplicates these > predictions to the new outputs. If I remove the GFF file from the ctl file I > get new predictions, that have the necessary corrections but they now have > unfriendly names. Do you have any suggestions on how to associate the old > names with the new predictions? I could simple blast the old proteins vs the > new ones and associate them in that manor but I was wondering if there were > any other options within Maker. > > Since I have the GFF files I also have the associated transcripts and > proteins. > Do I need to do some iteration of est2/genome then generate a new model gff > file? > > The issue we are dealing with is thousands of short introns in our gff file. > These are less than 20 bp and are not biologically feasible so we are trying > to correct the gene model predictions. > > Cheers > Ian > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Tue Apr 15 12:54:00 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Tue, 15 Apr 2014 13:54:00 -0400 Subject: [maker-devel] passing names from a gff to new predictions In-Reply-To: References: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Message-ID: <31BC21FD-D9D6-4B66-B0D7-C48FBC3B7A98@gmail.com> Carson, That seems to fix this issue. Thanks for the insight not something I would have ever come up with. Cheers Ian On Apr 15, 2014, at 1:31 PM, Carson Holt wrote: > If you give anything to pred_gff or model_gff then it is allowed to compete as a predictor and thus can end up in the final results. You stated that the models you are passing in have errors, and you don't want them to be allowed to compete and end up in your final models? Correct. > > MAKER is not made to expect erroneous input, so I don't have an easy solution for you (I do have a less easy solution though; but you will need to do some editing of the MAKER code). > > Open .../maker/lib/maker/auto_annotator.pm in an editor like emacs or vi. > Search for the 'best_annotations' subroutine (around line 1248 depending on which version of MAKER you have). > Then edit it as follows: > > This is how the top section of the subroutine should look at first --> > > sub best_annotations { > my $annotations = shift; > my $CTL_OPT = shift; > > my @predictors = @{$CTL_OPT->{_predictor}}; > > ... > > Change it to this --> > > sub best_annotations { > my $annotations = shift; > my $CTL_OPT = shift; > > my @predictors = grep {!/model_gff/} @{$CTL_OPT->{_predictor}}; > > ... > > > > Now run maker again with your old GFF3 file as input to model_gff, and just remember to change the MAKER code back to the way it was when your done with everything. Basically the change will hard filter model_gff results from being allowed into your final annotations. So names will still move from model_gff to your final results with the map_forward=1 option but none of the old models will make it as gene/mRNA/exon/CDS features in the final GFF3 (they will still be listed as match/match_part reference features though). > > Thanks, > Carson > > > > On 4/15/14, 11:01 AM, "UMD Bioinformatics" wrote: > >> Hello >> >> I have an interesting issue with an existing Maker gff. I have a gff file with human friendly names that I would like to pass to the new predictions. However, some of those genes in the human friendly gff file are incorrect or have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 the names move but so do the incorrect models. Maker simply duplicates these predictions to the new outputs. If I remove the GFF file from the ctl file I get new predictions, that have the necessary corrections but they now have unfriendly names. Do you have any suggestions on how to associate the old names with the new predictions? I could simple blast the old proteins vs the new ones and associate them in that manor but I was wondering if there were any other options within Maker. >> >> Since I have the GFF files I also have the associated transcripts and proteins. >> Do I need to do some iteration of est2/genome then generate a new model gff file? >> >> The issue we are dealing with is thousands of short introns in our gff file. These are less than 20 bp and are not biologically feasible so we are trying to correct the gene model predictions. >> >> Cheers >> Ian >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.king at rothamsted.ac.uk Wed Apr 16 06:27:09 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Wed, 16 Apr 2014 11:27:09 +0000 Subject: [maker-devel] scalar text in maker transcripts Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7C8DAC@rothex1.rothamsted.ac.uk> Hi, I've got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 16 16:56:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Apr 2014 15:56:25 -0600 Subject: [maker-devel] scalar text in maker transcripts Message-ID: The only time I have seen this is when fgenesh is used as a predictor and correct_est_fusion=1 is set (it was a bug in trimming long UTR's on fgenesh models). Is that how you have your job configured? If so, that particular bug is fixed in the current MAKER release. Thanks, Carson From: "Robert King (RRes-Roth)" Date: Wednesday, April 16, 2014 at 5:27 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] scalar text in maker transcripts Hi, I?ve got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner , and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.king at rothamsted.ac.uk Wed Apr 16 16:57:44 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Wed, 16 Apr 2014 21:57:44 +0000 Subject: [maker-devel] scalar text in maker transcripts In-Reply-To: <26314411-75c8-484f-9fbf-413e37d1c706@ROTHEX1.rothamsted.ac.uk> References: <26314411-75c8-484f-9fbf-413e37d1c706@ROTHEX1.rothamsted.ac.uk> Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7C8E85@rothex1.rothamsted.ac.uk> Yep I am. I?ll try upgrading. Thanks Rob From: Carson Holt [mailto:carsonhh at gmail.com] Sent: 16 April 2014 22:56 To: Robert King (RRes-Roth); maker-devel at yandell-lab.org Subject: Re: [maker-devel] scalar text in maker transcripts The only time I have seen this is when fgenesh is used as a predictor and correct_est_fusion=1 is set (it was a bug in trimming long UTR's on fgenesh models). Is that how you have your job configured? If so, that particular bug is fixed in the current MAKER release. Thanks, Carson From: "Robert King (RRes-Roth)" > Date: Wednesday, April 16, 2014 at 5:27 AM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] scalar text in maker transcripts Hi, I?ve got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From muriel.grosb at gmail.com Mon Apr 7 07:29:42 2014 From: muriel.grosb at gmail.com (Muriel Gros-Balthazard) Date: Mon, 7 Apr 2014 14:29:42 +0200 Subject: [maker-devel] Help for Repeat Library Construction Message-ID: <474C2DF8-B5DF-424B-BCF7-EC64BC23EEDC@gmail.com> Hello, I am working on the annotation of the date palm genome using the MAKER pipeline. I started by following the manual for Repeat Library Construction - Advanced. I am stuck in 2.1.3. Indeed, I should use muscle to filter. But I don?t understand what is the file flankingseqfile. How can I obtain it ? Also, do you hava more information about 2.1.4 and 2.1.5 ? Thanks a lot for this great pipeline and for your help, Muriel Gros-Balthazard From Brian.Mack at ARS.USDA.GOV Thu Apr 17 15:34:21 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Thu, 17 Apr 2014 20:34:21 +0000 Subject: [maker-devel] tbl2asn errors Message-ID: Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 - 441931, but the sqn file will have 440868 - 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 17 15:59:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Apr 2014 14:59:05 -0600 Subject: [maker-devel] tbl2asn errors Message-ID: The only one that may be a real error is the first one (I'm not sure what it means). You probably need to find them and open them in a viewer like apollo. The rest I would consider warnings (the NCBI tool doesn't like any weirdness or uncertainty). You often have to manually edit things to get NCBI to accept all models without complaining (sometimes even going against real biology). I know some groups use the always_complete=1 option in MAKER to force start and stop codons into every model for example (even though those forced codons are probably false). *Not sure about this one --> 4 ERROR: SEQ_FEAT.BadTrailingCharacter *These are partial genes with no stop (usually happen at the edge of contigs or near strings of NNNN) --> 217 ERROR: SEQ_FEAT.NoStop *These are just short introns (intron size is under control of the ab initio predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron *These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR: SEQ_FEAT.StartCodon *These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR: SEQ_INST.BadProteinStart *Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) --> 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor *Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) --> 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor *These are just short exons (exon size is under control of the ab initio predictors) --> 118 WARNING: SEQ_FEAT.ShortExon You probably need to identify examples of models causing each issue, and then look at the in Apollo. Apollo lets you open tbl format and save back to it. I imagine the coordinate change is from NCBI using a 0 based coordinate system as opposed to a 1 based system (I.e. first base is 0 rather than 1). Unfortunately getting everything to go into NCBI is usually a grueling task. --Carson From: "Mack, Brian" Date: Thursday, April 17, 2014 at 2:34 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] tbl2asn errors Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475 ). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Geib at ARS.USDA.GOV Thu Apr 17 15:59:22 2014 From: Scott.Geib at ARS.USDA.GOV (Geib, Scott) Date: Thu, 17 Apr 2014 20:59:22 +0000 Subject: [maker-devel] tbl2asn errors In-Reply-To: References: Message-ID: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian > Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " > Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 17 16:27:53 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Apr 2014 15:27:53 -0600 Subject: [maker-devel] tbl2asn errors In-Reply-To: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> References: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Message-ID: Very cool. I'll try it out as well. --Carson From: "Geib, Scott" Date: Thursday, April 17, 2014 at 2:59 PM To: "Mack, Brian" , "maker-devel at yandell-lab.org" , "Brian Hall (bhall7 at hawaii.edu)" Subject: Re: [maker-devel] tbl2asn errors Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475 ). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Geib at ARS.USDA.GOV Thu Apr 17 17:37:49 2014 From: Scott.Geib at ARS.USDA.GOV (Geib, Scott) Date: Thu, 17 Apr 2014 22:37:49 +0000 Subject: [maker-devel] tbl2asn errors In-Reply-To: References: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Message-ID: <0D54878997A4B9478F03938D61DB51D4266C1E@001FSN2MPN1-015.001f.mgd2.msft.net> Just so not to be discouraged, current version has limited functionality and is pretty much un-documented (although will write a .tbl file). Will email the list when first real release is complete and documented. Scott From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, April 17, 2014 11:28 AM To: Geib, Scott; Mack, Brian; maker-devel at yandell-lab.org; Brian Hall (bhall7 at hawaii.edu) Subject: Re: [maker-devel] tbl2asn errors Very cool. I'll try it out as well. --Carson From: "Geib, Scott" > Date: Thursday, April 17, 2014 at 2:59 PM To: "Mack, Brian" >, "maker-devel at yandell-lab.org" >, "Brian Hall (bhall7 at hawaii.edu)" > Subject: Re: [maker-devel] tbl2asn errors Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian > Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " > Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Fri Apr 18 08:14:45 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Fri, 18 Apr 2014 09:14:45 -0400 Subject: [maker-devel] Short Introns Message-ID: Hello, We are preparing two submission for NCBI, nightmare. However some of our MAKER gene models have short introns that are being flagged by NCBI. In one species we have >400 introns smaller then 20bp which is almost biologically impossible. I know we can set max intron length in the opts.ctl file but can we set a minimum intron length? I saw yesterdays posts that mention this is a result of the external ab initio predictors but I didn?t see an indication as to which predictor and how to change that setting. from yesterday: *These are just short introns (intron size is under control of the ab initio predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron Cheers Ian From carsonhh at gmail.com Fri Apr 18 10:35:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 18 Apr 2014 09:35:51 -0600 Subject: [maker-devel] Short Introns In-Reply-To: References: Message-ID: Look at the name of those genes. The original name will let you know where it came from because it will contain, augustus, genemark, snap, etc. You will also want to open up the contig containing those geens in a viewer like apollo (http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz). See if the short intron is part of the CDS or UTR. If it's UTR then, it has evidence support from an EST, which either means there are problems with the EST/cDNA evidence or it's real. For those, even if they are real you can just trim them off. If it's part of the CDS, then investigate whether it is suggested by EST or protein evidence, or if the ab initio predictor called it (sometime the ab initio predictor calls things to force an ORF to work). This can sometimes be indicative of assembly issues in that region. --Carson On 4/18/14, 7:14 AM, "UMD Bioinformatics" wrote: >Hello, > >We are preparing two submission for NCBI, nightmare. However some of our >MAKER gene models have short introns that are being flagged by NCBI. In >one species we have >400 introns smaller then 20bp which is almost >biologically impossible. I know we can set max intron length in the >opts.ctl file but can we set a minimum intron length? > >I saw yesterdays posts that mention this is a result of the external ab >initio predictors but I didn?t see an indication as to which predictor >and how to change that setting. > >from yesterday: >*These are just short introns (intron size is under control of the ab >initio >predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron > >Cheers >Ian > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.seidl at wur.nl Tue Apr 22 09:27:18 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:27:18 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Message-ID: Hi, I have a question on the post-processing of my maker output. I finished a maker run on a draft genome (231 scaffolds) without an error. To get a merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. However, I realized that I contains next to gff3 conform output, thousands of lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced scaffolds is correct, however I have my doubts if I successfully retrieved all annotations...Could you maybe point me to a possible solution... Thanks in advance Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 09:31:16 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:31:16 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: Message-ID: I've never seen this. What version of MAKER are you using? --Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:27 AM To: Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi, I have a question on the post-processing of my maker output. I finished a maker run on a draft genome (231 scaffolds) without an error. To get a merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. However, I realized that I contains next to gff3 conform output, thousands of lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced scaffolds is correct, however I have my doubts if I successfully retrieved all annotations...Could you maybe point me to a possible solution... Thanks in advance Michael _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.seidl at wur.nl Tue Apr 22 09:37:33 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:37:33 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Hi Carson, I am using maker 2.31. Thanks Michael On Tue, Apr 22, 2014 at 4:31 PM, Carson Holt wrote: > I've never seen this. What version of MAKER are you using? > > --Carson > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:27 AM > To: > > Subject: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi, > > I have a question on the post-processing of my maker output. I finished a > maker run on a draft genome (231 scaffolds) without an error. To get a > merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d > master_datastore_index.log. However, I realized that I contains next to > gff3 conform output, thousands of lines of array refs, e.g. > ARRAY(0x188a8578)). The total number of produced scaffolds is correct, > however I have my doubts if I successfully retrieved all > annotations...Could you maybe point me to a possible solution... > > Thanks in advance > Michael > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- *Michael F Seidl, PhD* Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 09:39:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:39:51 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Could you check the individual contig GFF3's before merge. Do any of those contain array refs? Also is it exactly 2.31 or the current 2.31.3? --Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi Carson, I am using maker 2.31. Thanks Michael On Tue, Apr 22, 2014 at 4:31 PM, Carson Holt wrote: > I've never seen this. What version of MAKER are you using? > > --Carson > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:27 AM > To: > > Subject: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi, > > I have a question on the post-processing of my maker output. I finished a > maker run on a draft genome (231 scaffolds) without an error. To get a merged > gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. > However, I realized that I contains next to gff3 conform output, thousands of > lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced > scaffolds is correct, however I have my doubts if I successfully retrieved all > annotations...Could you maybe point me to a possible solution... > > Thanks in advance > Michael > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.seidl at wur.nl Tue Apr 22 09:43:44 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:43:44 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt wrote: > any Dear Carson, maker -version returns 2.31. Yes, also the individual scaffolds seem to contain ARRAY refs, e.g. find -name "*gff" | xargs grep "ARRAY": ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 Cheers M -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 09:46:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:46:34 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:43 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt wrote: > any Dear Carson, maker -version returns 2.31. Yes, also the individual scaffolds seem to contain ARRAY refs, e.g. find -name "*gff" | xargs grep "ARRAY": ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 Cheers M -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Tue Apr 22 12:45:45 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Tue, 22 Apr 2014 23:15:45 +0530 Subject: [maker-devel] is using est_reads option safe? Message-ID: Hi, I need to run MAKER against a genome with both raw (FASTQ) and assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using est= options in maker_opts.ctl. Looking for how to point MAKER to the raw reads I came across this thread https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where Dr. Carlson Holt points out that est_gff should be used. However, from MAKER's run log it seems that est_reads option is not deprecated, just hidden from plain sight by excluding it from maker_opts.ctl. So I set est_reads option in maker_opts.ctl and MAKER parses the control files and runs just fine. Now I am left wondering if it's safe to use est_reads. As in, could it impact the predicted set negatively? -- Priyam From carsonhh at gmail.com Tue Apr 22 13:02:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 12:02:56 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: The est_reads option doesn't do anything. It in the run log for backwards compatibility with old jobs because MAKER has a restart capability (i.e. people can rerun new MAKER versions against old MAKER output in the same directory - it can reuse old raw results to avoid rerunning analysis steps). The est_reads was originally there for developer experimentation, but then it went away. You need to use an external tool like tophat and cufflinks to align short reads and assemble them into likely exon blocks (i.e. the GFF3 passthrough option you mentioned). Or you can assemble then without alignment using something like trinity (then you can provide that result to the est= options because it will be in fasta format). You should not use raw reads directly with MAKER, you need to preprocess them using one of the methods mentioned for them to be useful. Thanks, Carson On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >Hi, > >I need to run MAKER against a genome with both raw (FASTQ) and >assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >est= options in maker_opts.ctl. Looking for how to point MAKER to the >raw reads I came across this thread >https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >Dr. Carlson Holt points out that est_gff should be used. However, from >MAKER's run log it seems that est_reads option is not deprecated, just >hidden from plain sight by excluding it from maker_opts.ctl. So I set >est_reads option in maker_opts.ctl and MAKER parses the control files >and runs just fine. > >Now I am left wondering if it's safe to use est_reads. As in, could it >impact the predicted set negatively? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Apr 22 14:10:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 13:10:46 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> <155dca02dbb84844930703f598f57635@SCOMP0939.wurnet.nl> Message-ID: The issue was indeed caused by a bug in using the other_gff= file option. Could you place the attached file in .../maker/lib/. Then you can rerun maker to test if it fixes it ('maker -a' for fast rerun without analysis rerun). Alternately if you don't feel like rerunning everything, you can also filter out the lines using --> grep -v "ARRAY" file.gff Since the other_gff file is not used in any part of the analysis and is just a convenience option that prints any text given to it into the final GFF3 file, then filtering them out is the same as if you would have left other_gff blank when running MAKER. You can then use 'gff3_merge -s tophat.gff merged_genome.gff' to merge the desired extra lines back into your file outside of MAKER. Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 12:29 PM To: Carson Holt Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi Carson, I uploaded the files as an archive. Thanks Michael On Tue, Apr 22, 2014 at 5:04 PM, Carson Holt wrote: > In the base maker.output directory for the job, there will be a file with a > .db extension. Could you send that as well? I'm leaning towards this being > something odd happening with the GFF3 files used as input. Particularly the > other_gff= file. Could you upload this file as well --> > /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/toph > at.gff3. > > --Carson > > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:56 AM > To: Carson Holt > > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Should be uploading right now... > > Thanks Michael > > > > On Tue, Apr 22, 2014 at 4:46 PM, Carson Holt > > wrote: > Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it > here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > Thanks, > Carson > > > From: Michael Seidl > >> > Date: Tuesday, April 22, 2014 at 8:43 AM > To: Carson Holt > o:carsonhh at gmail.com>>> > Cc: > "maker-devel at yandell-lab.org devel at yandell-lab.org>" > devel at yandell-lab.org>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > > On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt > o:carsonhh at gmail.com>>> wrote: > any > > Dear Carson, > > maker -version returns 2.31. Yes, also the individual scaffolds seem to > contain ARRAY refs, e.g. > find -name "*gff" | xargs grep "ARRAY": > > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 > > Cheers > M > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: michael.seidl at wur.nl > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GFFDB.pm Type: text/x-perl-script Size: 52152 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 22 15:35:31 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 14:35:31 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> <155dca02dbb84844930703f598f57635@SCOMP0939.wurnet.nl> Message-ID: You can provide a comma separated list of files to est_gff. Also from experience cufflinks gives far better results than tophat. Tophat tends to have a lot of false positives that adversely affect the overall quality of gene models, so I usually recommend that people use cufflinks output and not even include the tophat results in their run. Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 2:30 PM To: Carson Holt Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Dear Carson, thanks a lot I will try. More importantly, you pointed me to a mistake in my procedure which will make me rerun the maker anyway :p I want maker to use the tophat.gff next to cufflinks est (fa + gff) as well as a protein.fa. I provide them currently as follows: #-----EST Evidence (for best results provide a file for at least one) est= /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/tr anscripts.cds.fa #set of ESTs or assembled mRNA-seq altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/tr anscripts.gff3 #aligned ESTs or mRNA-seq from a altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= /home/michael/data/side/alternaria/maker_annotation/fungal_proteins.fa #protein sequence file in fasta format (i.e. from mu protein_gff= #aligned protein homology evidence from an external GFF3 file Can I give the tophat.gff as a alttest.gff or is maker internally using est_gff and altest_gff differently? Sorry for this question, but I did not yet realized that the other_gff will be omitted during maker Thanks a lot Michael On Tue, Apr 22, 2014 at 9:10 PM, Carson Holt wrote: > The issue was indeed caused by a bug in using the other_gff= file option. > Could you place the attached file in .../maker/lib/. Then you can rerun maker > to test if it fixes it ('maker -a' for fast rerun without analysis rerun). > > Alternately if you don't feel like rerunning everything, you can also filter > out the lines using --> grep -v "ARRAY" file.gff > > Since the other_gff file is not used in any part of the analysis and is just a > convenience option that prints any text given to it into the final GFF3 file, > then filtering them out is the same as if you would have left other_gff blank > when running MAKER. You can then use 'gff3_merge -s tophat.gff > merged_genome.gff' to merge the desired extra lines back into your file > outside of MAKER. > > Thanks, > Carson > > > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 12:29 PM > To: Carson Holt > > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi Carson, > > I uploaded the files as an archive. > > Thanks > Michael > > > On Tue, Apr 22, 2014 at 5:04 PM, Carson Holt > > wrote: > In the base maker.output directory for the job, there will be a file with a > .db extension. Could you send that as well? I'm leaning towards this being > something odd happening with the GFF3 files used as input. Particularly the > other_gff= file. Could you upload this file as well --> > /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/toph > at.gff3. > > --Carson > > > From: Michael Seidl > >> > Date: Tuesday, April 22, 2014 at 8:56 AM > To: Carson Holt > o:carsonhh at gmail.com>>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Should be uploading right now... > > Thanks Michael > > > > On Tue, Apr 22, 2014 at 4:46 PM, Carson Holt > o:carsonhh at gmail.com>>> wrote: > Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it > here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > Thanks, > Carson > > > From: Michael Seidl > > l at wur.nl>>>> > Date: Tuesday, April 22, 2014 at 8:43 AM > To: Carson Holt > o:carsonhh at gmail.com>> ilto:carsonhh at gmail.com>>> > Cc: > "maker-devel at yandell-lab.org devel at yandell-lab.org> yandell-lab.org -lab.org>>" > devel at yandell-lab.org> yandell-lab.org -lab.org>>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > > On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt > o:carsonhh at gmail.com>> ilto:carsonhh at gmail.com>>> wrote: > any > > Dear Carson, > > maker -version returns 2.31. Yes, also the individual scaffolds seem to > contain ARRAY refs, e.g. > find -name "*gff" | xargs grep "ARRAY": > > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 > > Cheers > M > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: > michael.seidl at wur.nl mailto:michael.seidl at wur.nl>> > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: michael.seidl at wur.nl > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Wed Apr 23 04:55:37 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Wed, 23 Apr 2014 15:25:37 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: Thanks, Carson. I now understand that I shouldn't use est_reds options. Does MAKER utilise est_gff for prediction or simply passes the annotations through to the output GFF? In that case how is it different from using other_gff / model_gff (what's the difference between these two?) I have both assembled and raw reads. Is it sufficient to just use the assembled set? -- Priyam On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: > The est_reads option doesn't do anything. It in the run log for backwards > compatibility with old jobs because MAKER has a restart capability (i.e. > people can rerun new MAKER versions against old MAKER output in the same > directory - it can reuse old raw results to avoid rerunning analysis > steps). The est_reads was originally there for developer experimentation, > but then it went away. > > You need to use an external tool like tophat and cufflinks to align short > reads and assemble them into likely exon blocks (i.e. the GFF3 passthrough > option you mentioned). Or you can assemble then without alignment using > something like trinity (then you can provide that result to the est= > options because it will be in fasta format). > > You should not use raw reads directly with MAKER, you need to preprocess > them using one of the methods mentioned for them to be useful. > > Thanks, > Carson > > > > On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: > >>Hi, >> >>I need to run MAKER against a genome with both raw (FASTQ) and >>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>raw reads I came across this thread >>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>Dr. Carlson Holt points out that est_gff should be used. However, from >>MAKER's run log it seems that est_reads option is not deprecated, just >>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>est_reads option in maker_opts.ctl and MAKER parses the control files >>and runs just fine. >> >>Now I am left wondering if it's safe to use est_reads. As in, could it >>impact the predicted set negatively? >> >>-- Priyam >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Wed Apr 23 09:43:54 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 23 Apr 2014 08:43:54 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: est_gff is the equivalent of est=, but because the alignment structure is already in the GFF3, I don't need to align sequence with blastn/exonerate. model_gff and pred_gff are essentially the same with the difference being that model_gff can be kept in the final results even without supporting evidence, but pred_gff won't. Pred_gff needs evidence support because it is a potential model, where model_gff is considered a known model even if the structure of that model may be uncertain. other_gff is just a convenience method for passing through GFF3 features to the final result. It's impossible to have MAKER be aware of every kind of possible entry, so if you have something more exotic in the final output (sequence variant information, alternate alleles, promotor and methylation site, etc.) then you can pass it in there and it will just be printed into the file. It's basically the equivalent of concatenating two GFF3 files together, but it handles the proper reordering of sequence information at the end of the GFF3 file (because technically you can't just concatenate GFF3 files end-to-end). You can also use the gff3_merge tool that comes with MAKER to get the same effect. --Carson On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >Thanks, Carson. > >I now understand that I shouldn't use est_reds options. > >Does MAKER utilise est_gff for prediction or simply passes the >annotations through to the output GFF? In that case how is it >different from using other_gff / model_gff (what's the difference >between these two?) > >I have both assembled and raw reads. Is it sufficient to just use the >assembled set? > >-- Priyam > >On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: >> The est_reads option doesn't do anything. It in the run log for >>backwards >> compatibility with old jobs because MAKER has a restart capability (i.e. >> people can rerun new MAKER versions against old MAKER output in the same >> directory - it can reuse old raw results to avoid rerunning analysis >> steps). The est_reads was originally there for developer >>experimentation, >> but then it went away. >> >> You need to use an external tool like tophat and cufflinks to align >>short >> reads and assemble them into likely exon blocks (i.e. the GFF3 >>passthrough >> option you mentioned). Or you can assemble then without alignment using >> something like trinity (then you can provide that result to the est= >> options because it will be in fasta format). >> >> You should not use raw reads directly with MAKER, you need to preprocess >> them using one of the methods mentioned for them to be useful. >> >> Thanks, >> Carson >> >> >> >> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >> >>>Hi, >>> >>>I need to run MAKER against a genome with both raw (FASTQ) and >>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>raw reads I came across this thread >>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>MAKER's run log it seems that est_reads option is not deprecated, just >>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>and runs just fine. >>> >>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>impact the predicted set negatively? >>> >>>-- Priyam >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> From kdelmore at zoology.ubc.ca Tue Apr 22 23:48:08 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 22 Apr 2014 21:48:08 -0700 Subject: [maker-devel] problem with dsindex Message-ID: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> I am having some trouble with the dsindex tool. I used the fasta_tool to split my original multifasta file and ran maker with the ?base and ?g flags. I then used the dsindex tool to summarize results from each fasta. The tool finished without an error message and pointed me to where the files should be but when I went to that directory there was no datastore and the index.log said that it had started on each of the fastas but not finished. I got around this problem using gff3_merge by using the ?o option and providing paths to the gff files but this is not working with the fasta_merge tool. I don?t want to just cat the files together because I want to be sure the merged gff and protein.fasta files are the same for downstream annotation steps. I?ve included examples of the commands I used below and the output from dsindex. Note that the individual fastas finished without errors and produced datastores. I would really appreciate any input you might have with this problem and THANK YOU for developing such a user friendly pipeline. /maker/bin/fasta_tool --split placed.fasta mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides maker/bin/maker -dsindex -fix_nucleotides STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /placed.maker.output/placed_datastore ##this directory was not generated To access files for individual sequences use the datastore index: /placed.maker.output/placed_master_datastore_index.log /maker/bin/gff3_merge -o placed.gff * /maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta 2.maker.proteins.fasta ##this did not work From carson.holt at genetics.utah.edu Wed Apr 23 09:51:59 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 23 Apr 2014 14:51:59 +0000 Subject: [maker-devel] problem with dsindex In-Reply-To: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> References: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> Message-ID: I don't think all your contigs are finished or you did not supply the -base tag when running -dsindex. If it says STARTED rather than FINISHED, then the output files for that contig are missing from the directory it is looking at. For example this is how you should be running everything --> /maker/bin/fasta_tool --split placed.fasta mpiexec -n 4 /maker/bin/maker -base placed -g 1.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 2.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 3.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 4.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 5.fasta -fix_nucleotides Now all will write to placed.maker.output Then you need to do this--> maker/bin/maker -dsindex -base placed -g placed.fasta Then it will rebuild the index for placed.maker.output/placed_master_datastore_index.log Thanks, Carson On 4/22/14, 10:48 PM, "kdelmore at zoology.ubc.ca" wrote: >I am having some trouble with the dsindex tool. I used the fasta_tool to >split my original multifasta file and ran maker with the ?base and ?g >flags. I then used the dsindex tool to summarize results from each fasta. >The tool finished without an error message and pointed me to where the >files should be but when I went to that directory there was no datastore >and the index.log said that it had started on each of the fastas but not >finished. I got around this problem using gff3_merge by using the ?o >option and providing paths to the gff files but this is not working with >the fasta_merge tool. I don?t want to just cat the files together because >I want to be sure the merged gff and protein.fasta files are the same for >downstream annotation steps. I?ve included examples of the commands I used >below and the output from dsindex. Note that the individual fastas >finished without errors and produced datastores. > >I would really appreciate any input you might have with this problem and >THANK YOU for developing such a user friendly pipeline. > >/maker/bin/fasta_tool --split placed.fasta > >mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides > >maker/bin/maker -dsindex -fix_nucleotides >STATUS: Parsing control files... >STATUS: Processing and indexing input FASTA files... >STATUS: Setting up database for any GFF3 input... >A data structure will be created for you at: >/placed.maker.output/placed_datastore ##this directory was not generated >To access files for individual sequences use the datastore index: >/placed.maker.output/placed_master_datastore_index.log > >/maker/bin/gff3_merge -o placed.gff * > >/maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta >2.maker.proteins.fasta ##this did not work > > > From carsonhh at gmail.com Wed Apr 23 09:57:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 23 Apr 2014 08:57:34 -0600 Subject: [maker-devel] problem with dsindex In-Reply-To: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> References: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> Message-ID: Also fasta_merge works differently than gff3_merge. It requires the datastore index because it is trying to find directories and the 'type' and 'group' the fasta files in those directories. Without the datastore index, it is the equivalent of 'cat file1.fa file2.fa > file3.fa'. It also requires the '-i' flag when specifying individual fasta files. --Carson On 4/22/14, 10:48 PM, "kdelmore at zoology.ubc.ca" wrote: >I am having some trouble with the dsindex tool. I used the fasta_tool to >split my original multifasta file and ran maker with the ?base and ?g >flags. I then used the dsindex tool to summarize results from each fasta. >The tool finished without an error message and pointed me to where the >files should be but when I went to that directory there was no datastore >and the index.log said that it had started on each of the fastas but not >finished. I got around this problem using gff3_merge by using the ?o >option and providing paths to the gff files but this is not working with >the fasta_merge tool. I don?t want to just cat the files together because >I want to be sure the merged gff and protein.fasta files are the same for >downstream annotation steps. I?ve included examples of the commands I used >below and the output from dsindex. Note that the individual fastas >finished without errors and produced datastores. > >I would really appreciate any input you might have with this problem and >THANK YOU for developing such a user friendly pipeline. > >/maker/bin/fasta_tool --split placed.fasta > >mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides > >maker/bin/maker -dsindex -fix_nucleotides >STATUS: Parsing control files... >STATUS: Processing and indexing input FASTA files... >STATUS: Setting up database for any GFF3 input... >A data structure will be created for you at: >/placed.maker.output/placed_datastore ##this directory was not generated >To access files for individual sequences use the datastore index: >/placed.maker.output/placed_master_datastore_index.log > >/maker/bin/gff3_merge -o placed.gff * > >/maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta >2.maker.proteins.fasta ##this did not work > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From a.priyam at qmul.ac.uk Thu Apr 24 02:28:38 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Thu, 24 Apr 2014 12:58:38 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: You say est_gff is the equivalent of est= (except that alignment structure is a part of gff). What would MAKER do if I set both est= and est_gff= options in maker_opts.ctl? Will it ignore est=? -- Priyam On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: > est_gff is the equivalent of est=, but because the alignment structure is > already in the GFF3, I don't need to align sequence with blastn/exonerate. > model_gff and pred_gff are essentially the same with the difference being > that model_gff can be kept in the final results even without supporting > evidence, but pred_gff won't. Pred_gff needs evidence support because it > is a potential model, where model_gff is considered a known model even if > the structure of that model may be uncertain. > > other_gff is just a convenience method for passing through GFF3 features > to the final result. It's impossible to have MAKER be aware of every kind > of possible entry, so if you have something more exotic in the final > output (sequence variant information, alternate alleles, promotor and > methylation site, etc.) then you can pass it in there and it will just be > printed into the file. It's basically the equivalent of concatenating two > GFF3 files together, but it handles the proper reordering of sequence > information at the end of the GFF3 file (because technically you can't > just concatenate GFF3 files end-to-end). You can also use the gff3_merge > tool that comes with MAKER to get the same effect. > > --Carson > > > > On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: > >>Thanks, Carson. >> >>I now understand that I shouldn't use est_reds options. >> >>Does MAKER utilise est_gff for prediction or simply passes the >>annotations through to the output GFF? In that case how is it >>different from using other_gff / model_gff (what's the difference >>between these two?) >> >>I have both assembled and raw reads. Is it sufficient to just use the >>assembled set? >> >>-- Priyam >> >>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: >>> The est_reads option doesn't do anything. It in the run log for >>>backwards >>> compatibility with old jobs because MAKER has a restart capability (i.e. >>> people can rerun new MAKER versions against old MAKER output in the same >>> directory - it can reuse old raw results to avoid rerunning analysis >>> steps). The est_reads was originally there for developer >>>experimentation, >>> but then it went away. >>> >>> You need to use an external tool like tophat and cufflinks to align >>>short >>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>passthrough >>> option you mentioned). Or you can assemble then without alignment using >>> something like trinity (then you can provide that result to the est= >>> options because it will be in fasta format). >>> >>> You should not use raw reads directly with MAKER, you need to preprocess >>> them using one of the methods mentioned for them to be useful. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>> >>>>Hi, >>>> >>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>raw reads I came across this thread >>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>and runs just fine. >>>> >>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>impact the predicted set negatively? >>>> >>>>-- Priyam >>>> >>>>_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> > > From carsonhh at gmail.com Thu Apr 24 09:15:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 24 Apr 2014 08:15:07 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: It will use both. you can also provide multiple files to either using comma separated lists. --Carson On 4/24/14, 1:28 AM, "Anurag Priyam" wrote: >You say est_gff is the equivalent of est= (except that alignment >structure is a part of gff). What would MAKER do if I set both est= >and est_gff= options in maker_opts.ctl? Will it ignore est=? > >-- Priyam > >On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: >> est_gff is the equivalent of est=, but because the alignment structure >>is >> already in the GFF3, I don't need to align sequence with >>blastn/exonerate. >> model_gff and pred_gff are essentially the same with the difference >>being >> that model_gff can be kept in the final results even without supporting >> evidence, but pred_gff won't. Pred_gff needs evidence support because >>it >> is a potential model, where model_gff is considered a known model even >>if >> the structure of that model may be uncertain. >> >> other_gff is just a convenience method for passing through GFF3 features >> to the final result. It's impossible to have MAKER be aware of every >>kind >> of possible entry, so if you have something more exotic in the final >> output (sequence variant information, alternate alleles, promotor and >> methylation site, etc.) then you can pass it in there and it will just >>be >> printed into the file. It's basically the equivalent of concatenating >>two >> GFF3 files together, but it handles the proper reordering of sequence >> information at the end of the GFF3 file (because technically you can't >> just concatenate GFF3 files end-to-end). You can also use the >>gff3_merge >> tool that comes with MAKER to get the same effect. >> >> --Carson >> >> >> >> On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >> >>>Thanks, Carson. >>> >>>I now understand that I shouldn't use est_reds options. >>> >>>Does MAKER utilise est_gff for prediction or simply passes the >>>annotations through to the output GFF? In that case how is it >>>different from using other_gff / model_gff (what's the difference >>>between these two?) >>> >>>I have both assembled and raw reads. Is it sufficient to just use the >>>assembled set? >>> >>>-- Priyam >>> >>>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt >>>wrote: >>>> The est_reads option doesn't do anything. It in the run log for >>>>backwards >>>> compatibility with old jobs because MAKER has a restart capability >>>>(i.e. >>>> people can rerun new MAKER versions against old MAKER output in the >>>>same >>>> directory - it can reuse old raw results to avoid rerunning analysis >>>> steps). The est_reads was originally there for developer >>>>experimentation, >>>> but then it went away. >>>> >>>> You need to use an external tool like tophat and cufflinks to align >>>>short >>>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>>passthrough >>>> option you mentioned). Or you can assemble then without alignment >>>>using >>>> something like trinity (then you can provide that result to the est= >>>> options because it will be in fasta format). >>>> >>>> You should not use raw reads directly with MAKER, you need to >>>>preprocess >>>> them using one of the methods mentioned for them to be useful. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>>> >>>>>Hi, >>>>> >>>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>>raw reads I came across this thread >>>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>>and runs just fine. >>>>> >>>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>>impact the predicted set negatively? >>>>> >>>>>-- Priyam >>>>> >>>>>_______________________________________________ >>>>>maker-devel mailing list >>>>>maker-devel at box290.bluehost.com >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >> >> From anurag08priyam at gmail.com Thu Apr 24 09:26:24 2014 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Thu, 24 Apr 2014 19:56:24 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: That answers all my questions. Thanks, Carson. -- Priyam On Thu, Apr 24, 2014 at 7:45 PM, Carson Holt wrote: > It will use both. you can also provide multiple files to either using > comma separated lists. > > --Carson > > > On 4/24/14, 1:28 AM, "Anurag Priyam" wrote: > >>You say est_gff is the equivalent of est= (except that alignment >>structure is a part of gff). What would MAKER do if I set both est= >>and est_gff= options in maker_opts.ctl? Will it ignore est=? >> >>-- Priyam >> >>On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: >>> est_gff is the equivalent of est=, but because the alignment structure >>>is >>> already in the GFF3, I don't need to align sequence with >>>blastn/exonerate. >>> model_gff and pred_gff are essentially the same with the difference >>>being >>> that model_gff can be kept in the final results even without supporting >>> evidence, but pred_gff won't. Pred_gff needs evidence support because >>>it >>> is a potential model, where model_gff is considered a known model even >>>if >>> the structure of that model may be uncertain. >>> >>> other_gff is just a convenience method for passing through GFF3 features >>> to the final result. It's impossible to have MAKER be aware of every >>>kind >>> of possible entry, so if you have something more exotic in the final >>> output (sequence variant information, alternate alleles, promotor and >>> methylation site, etc.) then you can pass it in there and it will just >>>be >>> printed into the file. It's basically the equivalent of concatenating >>>two >>> GFF3 files together, but it handles the proper reordering of sequence >>> information at the end of the GFF3 file (because technically you can't >>> just concatenate GFF3 files end-to-end). You can also use the >>>gff3_merge >>> tool that comes with MAKER to get the same effect. >>> >>> --Carson >>> >>> >>> >>> On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >>> >>>>Thanks, Carson. >>>> >>>>I now understand that I shouldn't use est_reds options. >>>> >>>>Does MAKER utilise est_gff for prediction or simply passes the >>>>annotations through to the output GFF? In that case how is it >>>>different from using other_gff / model_gff (what's the difference >>>>between these two?) >>>> >>>>I have both assembled and raw reads. Is it sufficient to just use the >>>>assembled set? >>>> >>>>-- Priyam >>>> >>>>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt >>>>wrote: >>>>> The est_reads option doesn't do anything. It in the run log for >>>>>backwards >>>>> compatibility with old jobs because MAKER has a restart capability >>>>>(i.e. >>>>> people can rerun new MAKER versions against old MAKER output in the >>>>>same >>>>> directory - it can reuse old raw results to avoid rerunning analysis >>>>> steps). The est_reads was originally there for developer >>>>>experimentation, >>>>> but then it went away. >>>>> >>>>> You need to use an external tool like tophat and cufflinks to align >>>>>short >>>>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>>>passthrough >>>>> option you mentioned). Or you can assemble then without alignment >>>>>using >>>>> something like trinity (then you can provide that result to the est= >>>>> options because it will be in fasta format). >>>>> >>>>> You should not use raw reads directly with MAKER, you need to >>>>>preprocess >>>>> them using one of the methods mentioned for them to be useful. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>>>> >>>>>>Hi, >>>>>> >>>>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>>>raw reads I came across this thread >>>>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>>>and runs just fine. >>>>>> >>>>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>>>impact the predicted set negatively? >>>>>> >>>>>>-- Priyam >>>>>> >>>>>>_______________________________________________ >>>>>>maker-devel mailing list >>>>>>maker-devel at box290.bluehost.com >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>>g >>>>> >>>>> >>> >>> > > From matthew.macmanes at unh.edu Sat Apr 26 09:56:25 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Sat, 26 Apr 2014 10:56:25 -0400 Subject: [maker-devel] Use of each() on hash Message-ID: Hello, I am getting a large number of errors, while running maker on my ubuntu server. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI Compiled perl modules using ./build Thanks for any insight anyone may have. __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 10:26:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 09:26:24 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: Message-ID: The message appears to be coming from forks.pm. Probably a warning added to perl 5.18.2 which is really really new (other versions don't care about this), and most developers would not consider 5.18 a fully stable release for production purposes (it will have lots of test features and messages that will get improved or dropped rather quickly). You can try updating the forks module from CPAN. Otherwise I would ignore it, as forks is sufficiently tested to know it works (it's not a MAKER module, it a widely used CPAN module - literally tens of thousands of scripts use it worldwide). The authors of forks.pm will take steps to silence the warning rather quickly, or the warning will be removed from the perl interpreter altogether. Thanks, Carson Sent from my iPhone > On Apr 26, 2014, at 8:56 AM, Matthew MacManes wrote: > > Hello, > > I am getting a large number of errors, while running maker on my ubuntu server. > > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > > It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. > > For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI > > Compiled perl modules using ./build > > Thanks for any insight anyone may have. > > __________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Sat Apr 26 22:34:16 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 27 Apr 2014 03:34:16 +0000 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: Message-ID: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt > wrote: The message appears to be coming from forks.pm. Probably a warning added to perl 5.18.2 which is really really new (other versions don't care about this), and most developers would not consider 5.18 a fully stable release for production purposes (it will have lots of test features and messages that will get improved or dropped rather quickly). You can try updating the forks module from CPAN. Otherwise I would ignore it, as forks is sufficiently tested to know it works (it's not a MAKER module, it a widely used CPAN module - literally tens of thousands of scripts use it worldwide). The authors of forks.pm will take steps to silence the warning rather quickly, or the warning will be removed from the perl interpreter altogether. Thanks, Carson Sent from my iPhone On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: Hello, I am getting a large number of errors, while running maker on my ubuntu server. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI Compiled perl modules using ./build Thanks for any insight anyone may have. __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 23:06:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 22:06:46 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> References: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> Message-ID: Yah, I had already seen that ticket. It's related to changing the function from a while loop to a foreach loop just to suppress the warning. Not sure why the forks.pm maintainer hasn't looked at it, but I imagine he will probably just do something more like --> no warnings qw(each); or whatever would suppress that warning without altering anything else in the code. I wouldn't say 5.18 is a development release. What said is that it's not good for 'production'. The problem is that most system still use 5.10 and 5.12, with a very few only recently moving to 5.16 (amazon's EC2 images for example). So you will find that issues with even very popular CPAN modules (as we see here) will be more common in something like 5.18.X. Not because 5.18 is flawed, or buggy, but because it's not yet used enough to flush out all the secondary issues it can cause elsewhere in wider world of perl. Thanks, Carson From: "Fields, Christopher J" Date: Saturday, April 26, 2014 at 9:34 PM To: Carson Holt Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt wrote: > The message appears to be coming from forks.pm. Probably a warning added to > perl 5.18.2 which is really really new (other versions don't care about this), > and most developers would not consider 5.18 a fully stable release for > production purposes (it will have lots of test features and messages that will > get improved or dropped rather quickly). You can try updating the forks > module from CPAN. Otherwise I would ignore it, as forks is sufficiently > tested to know it works (it's not a MAKER module, it a widely used CPAN module > - literally tens of thousands of scripts use it worldwide). The authors of > forks.pm will take steps to silence the warning rather quickly, or the warning > will be removed from the perl interpreter altogether. > > Thanks, > Carson > > Sent from my iPhone > > On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: > >> Hello, >> >> I am getting a large number of errors, while running maker on my ubuntu >> server. >> >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x2045200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x837200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x9d1200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> >> It is unclear how this effects the results or performance of the software, >> but these errors are repeated thousands of times in even a small run. >> >> For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI >> >> Compiled perl modules using ./build >> >> Thanks for any insight anyone may have. >> >> __________________________________ >> Matthew MacManes, Ph.D. >> University of New Hampshire I Assistant Professor >> Department of Molecular, Cellular, & Biomedical Sciences >> Durham, NH 03824 >> Phone: 603-862-4052 I Twitter: @PeroMHC >> Web: genomebio.org >> Office: 189 Rudman Hall I Lab: 145 Rudman Hall >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 23:51:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 22:51:30 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> Message-ID: If you don't want to wait for the fork.pm maintainer to alter his code and submit an update to CPAN, you should be able to suppress the warning by manually editing forks.pm line 1736 yourself. Change it from this --> $write = each %WRITE; To this (make sure to include the {} brackets)--> { no warnings qw(internal); $write = each %WRITE; } The issue is because the modules author has his code calling 'each', altering the hash, and then calling 'each' again which causes a warning in perl 5.18+. In this case it's relatively innocuous because of how the value and 'each' function are being used (any hash reordering ends up being handled in an outer while loop). Thanks, Carson From: Carson Holt Date: Saturday, April 26, 2014 at 10:06 PM To: "Fields, Christopher J" Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash Yah, I had already seen that ticket. It's related to changing the function from a while loop to a foreach loop just to suppress the warning. Not sure why the forks.pm maintainer hasn't looked at it, but I imagine he will probably just do something more like --> no warnings qw(each); or whatever would suppress that warning without altering anything else in the code. I wouldn't say 5.18 is a development release. What said is that it's not good for 'production'. The problem is that most system still use 5.10 and 5.12, with a very few only recently moving to 5.16 (amazon's EC2 images for example). So you will find that issues with even very popular CPAN modules (as we see here) will be more common in something like 5.18.X. Not because 5.18 is flawed, or buggy, but because it's not yet used enough to flush out all the secondary issues it can cause elsewhere in wider world of perl. Thanks, Carson From: "Fields, Christopher J" Date: Saturday, April 26, 2014 at 9:34 PM To: Carson Holt Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt wrote: > The message appears to be coming from forks.pm. Probably a warning added to > perl 5.18.2 which is really really new (other versions don't care about this), > and most developers would not consider 5.18 a fully stable release for > production purposes (it will have lots of test features and messages that will > get improved or dropped rather quickly). You can try updating the forks > module from CPAN. Otherwise I would ignore it, as forks is sufficiently > tested to know it works (it's not a MAKER module, it a widely used CPAN module > - literally tens of thousands of scripts use it worldwide). The authors of > forks.pm will take steps to silence the warning rather quickly, or the warning > will be removed from the perl interpreter altogether. > > Thanks, > Carson > > Sent from my iPhone > > On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: > >> Hello, >> >> I am getting a large number of errors, while running maker on my ubuntu >> server. >> >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x2045200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x837200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x9d1200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> >> It is unclear how this effects the results or performance of the software, >> but these errors are repeated thousands of times in even a small run. >> >> For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI >> >> Compiled perl modules using ./build >> >> Thanks for any insight anyone may have. >> >> __________________________________ >> Matthew MacManes, Ph.D. >> University of New Hampshire I Assistant Professor >> Department of Molecular, Cellular, & Biomedical Sciences >> Durham, NH 03824 >> Phone: 603-862-4052 I Twitter: @PeroMHC >> Web: genomebio.org >> Office: 189 Rudman Hall I Lab: 145 Rudman Hall >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From muriel.grosb at gmail.com Mon Apr 28 03:35:25 2014 From: muriel.grosb at gmail.com (Muriel Gros-Balthazard) Date: Mon, 28 Apr 2014 10:35:25 +0200 Subject: [maker-devel] Repeat Library Construction : Exclusion of gene fragments Message-ID: <535E12CD.9020302@gmail.com> Hello ! I ran RepeatModeler and seperates the output into ModelerID.lib and Modelerunknown.lib as it is explained in the protocole. In total, I have about 600 sequences in these two files. I now want to exclude gene fragments. I downloaded in UniProtDB all the plant protein sequences and plan to use blastx. However, I don't know which parameter I should use for blastx, especially, the -e value ? Thanks a lot for your help, Muriel GB From mhinsley at ebi.ac.uk Tue Apr 29 03:21:06 2014 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Tue, 29 Apr 2014 09:21:06 +0100 Subject: [maker-devel] unexpected alternate splicing Message-ID: <535F60F2.5050902@ebi.ac.uk> Hi We've just reinstalled maker 2.31 using mpich3 (3.1) and are delighted that file locking and other issues have been resolved. (I'm running maker across several nodes on the compute farm). The maker code is identical: I took the previous tar.gz archive and made a clean build. Using a copy of a previous configuration to test, the only differences I can see is that the location of some files has changed (the working directory is on a different file system) and that I'm using a bigger (unfiltered) repeat library. The previous maker run produced 17393 genes and 17393 mRNAs, and this new version gives 15927 genes and 21328 mRNA. I have alt_splice=0: $ grep splice ../maker_opts.ctl alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no Any idea why I'm getting multiple mRNAs per gene? -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom From carsonhh at gmail.com Tue Apr 29 07:59:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Apr 2014 06:59:04 -0600 Subject: [maker-devel] unexpected alternate splicing In-Reply-To: <535F60F2.5050902@ebi.ac.uk> References: <535F60F2.5050902@ebi.ac.uk> Message-ID: <1653CD3E-CEB7-437E-88CC-0F65C9BDA931@gmail.com> Are you using gff3 files as input? If so, could you send those to me? They are probably coming from thise. --carson Sent from my iPhone > On Apr 29, 2014, at 2:21 AM, Malcolm Hinsley wrote: > > Hi > > We've just reinstalled maker 2.31 using mpich3 (3.1) and are delighted that file locking and other issues have been resolved. (I'm running maker across several nodes on the compute farm). The maker code is identical: I took the previous tar.gz archive and made a clean build. > > Using a copy of a previous configuration to test, the only differences I can see is that the location of some files has changed (the working directory is on a different file system) and that I'm using a bigger (unfiltered) repeat library. > > The previous maker run produced 17393 genes and 17393 mRNAs, and this new version gives 15927 genes and 21328 mRNA. > > I have alt_splice=0: > > $ grep splice ../maker_opts.ctl > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > > > Any idea why I'm getting multiple mRNAs per gene? > > -- > malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 > European Bioinformatics Institute (EMBL-EBI) > European Molecular Biology Laboratory > Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD > United Kingdom > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Wed Apr 30 09:53:29 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 30 Apr 2014 14:53:29 +0000 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: <1398869131512.52399@uga.edu> References: <1398869131512.52399@uga.edu> Message-ID: From: Sivaranjani Namasivayam > Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" > Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 30 09:55:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Apr 2014 08:55:12 -0600 Subject: [maker-devel] FW: protein2genome gene models Message-ID: Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed Apr 30 18:25:17 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 30 Apr 2014 16:25:17 -0700 Subject: [maker-devel] est_forward and conflicting names Message-ID: Hi, Carson. I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m using with est and protein to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I?m using est_forward to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like orf1234 or other numeric IDs. When multiple evidence sequences map to the same location, how does est_forward choose which name to use? If it?s chosen arbitrarily, could it be possible to choose the most common name instead? Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.macmanes at unh.edu Tue Apr 1 05:23:59 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Tue, 1 Apr 2014 07:23:59 -0400 Subject: [maker-devel] Installing Maker on Cray Message-ID: Hello, I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. Any advice? Thanks, Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at icloud.com Tue Apr 1 06:58:35 2014 From: carson.holt at icloud.com (Carson Holt) Date: Tue, 01 Apr 2014 06:58:35 -0600 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: References: Message-ID: Create a soft link called mpicc. I can't guarantee shared libraries are installed on you system though as not all system derived versions of MPICH2 have been configured with shared libraries. --Carson Sent from my iPhone > On Apr 1, 2014, at 5:23 AM, Matthew MacManes wrote: > > Hello, > > I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page > > Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC > > When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. > > Any advice? > > Thanks, Matt > > __________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.macmanes at unh.edu Tue Apr 1 10:11:55 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Tue, 1 Apr 2014 12:11:55 -0400 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> References: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> Message-ID: Hi Carson and list: I tried that - we'll see if it works. I'm hung up on Perl dependencies right now - the Craycc compiler is not happy with several of them (forks, to name one). If anybody has installed Maker on a Cray, please contact me! Thanks, Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Tue, Apr 1, 2014 at 8:58 AM, Carson Holt wrote: > Create a soft link called mpicc. I can't guarantee shared libraries are > installed on you system though as not all system derived versions of MPICH2 > have been configured with shared libraries. > > --Carson > > > > Sent from my iPhone > > On Apr 1, 2014, at 5:23 AM, Matthew MacManes > wrote: > > Hello, > > I am trying to install the MPI version of Maker on our Cray > supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page > > Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's > own proprietary compilers mpicc=cc and mpicxx=CC > > When running the 1st step in src 'perl Build.pl', it asks me for the > location of mpicc - I can give the full path to Cray equivalent cc, but it > is not recognized. Many other programs allow me to specify the c compiler, > e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. > > Any advice? > > Thanks, Matt > > __________________________________ > *Matthew MacManes*, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Apr 1 10:29:40 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Apr 2014 16:29:40 +0000 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: References: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> Message-ID: <350474CE-B7EB-4EFF-9C8B-AD71FBB81CA3@illinois.edu> We might be interested in that ourselves at some point: https://bluewaters.ncsa.illinois.edu chris On Apr 1, 2014, at 11:11 AM, Matthew MacManes > wrote: Hi Carson and list: I tried that - we'll see if it works. I'm hung up on Perl dependencies right now - the Craycc compiler is not happy with several of them (forks, to name one). If anybody has installed Maker on a Cray, please contact me! Thanks, Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Tue, Apr 1, 2014 at 8:58 AM, Carson Holt > wrote: Create a soft link called mpicc. I can't guarantee shared libraries are installed on you system though as not all system derived versions of MPICH2 have been configured with shared libraries. --Carson Sent from my iPhone On Apr 1, 2014, at 5:23 AM, Matthew MacManes > wrote: Hello, I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. Any advice? Thanks, Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at bioperl.org Tue Apr 1 12:39:14 2014 From: jason at bioperl.org (Jason Stajich) Date: Tue, 1 Apr 2014 11:39:14 -0700 Subject: [maker-devel] maker to EvidenceModeler In-Reply-To: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> References: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> Message-ID: I've used this script I wrote to make the necessary input files from maker GFF3. https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/maker2evm.pl Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Tue, Mar 25, 2014 at 9:33 AM, dhivya arasappan wrote: > Hi Carson and others, > > Is there an easy tool/pipeline available as part of maker utilities to > convert maker and SNAP output to files acceptable by EvidenceModeler? > > It looks like it also needs just gff files, but with a few tweaks. > EvidenceModeler seems better equipped to handle PASA annotation results > than maker results. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 1 12:36:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 01 Apr 2014 12:36:44 -0600 Subject: [maker-devel] Missing UTRs in GFF In-Reply-To: References: Message-ID: It was indeed caused by the correct_est_fusion=1 option (which is supposed to trim off UTR if it appears overlap of UTR across genes is caused by merged mRNAseq). I have attached a patch that is used to replace .../maker/lib/maker/auto_annotator.pm, and I've updated the website download to include the patch as in MAKER download version 2.31.3. Thanks, Carson From: Benjamin Rubin Date: Tuesday, April 1, 2014 at 9:21 AM To: Carson Holt Subject: Re: [maker-devel] Missing UTRs in GFF OK, I think I uploaded everything. I included a cleaned up version of the control file without all of my paths in case that is useful. Thanks, Ben On Tue, Apr 1, 2014 at 9:50 AM, Carson Holt wrote: > Could upload your input fasta and hmm files as well. Sometimes I can > reproduce errors using just the raw reports, but it looks like I will need the > input files. > > --Carson > > > From: Benjamin Rubin > Date: Tuesday, April 1, 2014 at 8:38 AM > To: Carson Holt > Subject: Re: [maker-devel] Missing UTRs in GFF > > Hi Carson, > > I tried using version 2.31 on a scaffold where this problem occurred with 2.30 > and got the same result, unfortunately. I did use corr_est_fusion=1 both times > so this might be related. I have uploaded the sequence for this scaffold and > the output directory under username "brubin". Is this the data that you meant? > > I am also reattaching information on a representative problem gene from this > scaffold that occurs at base 1330779. > > Thanks so much for the help, > Ben > > > On Mon, Mar 31, 2014 at 9:37 AM, Carson Holt wrote: >> Not something I've seen before, but there was a patch for another issue that >> was cause by the use of avoid_est_fusion=1, that may be related. Try the >> current stable release 2.31, and let me know if it still happens. >> >> You can also upload the contig folder from one of the regions in question >> here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> Then I could verify the bug, and see if it is something that happens in the >> current release. >> >> --Carson >> >> >> From: Benjamin Rubin >> Date: Saturday, March 29, 2014 at 10:24 AM >> To: >> Subject: [maker-devel] Missing UTRs in GFF >> >> I have annotated a eukaryotic genome with MAKER 2.30. I recently realized >> that there are a few genes in the GFF file produced by gff3_merge with >> inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs >> have their own lines in the GFF file. However, for the problematic genes, the >> UTRs are not specified in the GFF file and all exons are annotated as CDS. >> The UTRs do appear in the gene header and the protein sequences are the >> correct length (do not include the UTR). I have attached an example from the >> GFF file. >> >> Is this a known problem, or have I done something wrong? Is there an easy way >> to fix the GFF file? >> >> Thanks for your help, >> Ben >> >> -- >> _____________________________________________________ >> Benjamin ER Rubin >> PhD Candidate >> Committee on Evolutionary Biology >> University of Chicago >> benrubin.org >> >> Division of Insects >> Zoology Department >> Field Museum of Natural History >> 1400 South Lake Shore Drive >> Chicago, IL 60605 >> USA >> Office: (312) 665-7776 >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > benrubin.org > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: auto_annotator.pm Type: text/x-perl-script Size: 101567 bytes Desc: not available URL: From amelia.ireland at gmod.org Thu Apr 3 15:10:53 2014 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Thu, 3 Apr 2014 14:10:53 -0700 Subject: [maker-devel] GMOD Online Training 2014 Message-ID: Greetings GMOD community! Applications are now open for the 2014 GMOD online training course, to be held from May 19th - 23rd 2014. The course will cover the installation, configuration, and usage of core GMOD software, including GBrowse and JBrowse, Galaxy, MAKER, Tripal, WebApollo, Canto, and the Chado database. The course is taught by experienced instructors and developers with deep knowledge of the tools. Although the course will be run online, students will be able to interact with the tutors and fellow attendees, ask questions, and so on. For more information and to apply, please see http://gmod.org/wiki/GMOD_Online_Training_2014 If you have any questions, please contact the GMOD help desk at help at gmod.org. Thanks! -- Amelia Ireland GMOD Community Support Generic Model Organism Database project http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From Brian.Mack at ARS.USDA.GOV Mon Apr 7 06:55:01 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Mon, 7 Apr 2014 12:55:01 +0000 Subject: [maker-devel] maker_functional_gff Message-ID: Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular "-outfmt 6" option against the uniprot uniref-50. I put these results in the maker_functional_gff program using "maker_functional_gff uniref-50 blastp-output maker.gff" but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I'm doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 7 08:58:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 08:58:20 -0600 Subject: [maker-devel] maker_functional_gff Message-ID: maker_functional_gff works with UniProt/Swiss-Prot. The uniref-50 headers are different. The script looks for the OS= GN= and PE= tags. You might be able to coerce it into working on the UniRef header by changing Tax= to OS=, RepID= to GN= and then adding a PE= to the end of the header as just a placeholder. --Carson From: "Mack, Brian" Date: Monday, April 7, 2014 at 6:55 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] maker_functional_gff Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular ?-outfmt 6? option against the uniprot uniref-50. I put these results in the maker_functional_gff program using ?maker_functional_gff uniref-50 blastp-output maker.gff? but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I?m doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 7 09:02:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 09:02:55 -0600 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: I added a line to look for the UniRef header format in the attached scripts. Go ahead and give it a try. --Carson From: Carson Holt Date: Monday, April 7, 2014 at 8:58 AM To: "Mack, Brian" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker_functional_gff maker_functional_gff works with UniProt/Swiss-Prot. The uniref-50 headers are different. The script looks for the OS= GN= and PE= tags. You might be able to coerce it into working on the UniRef header by changing Tax= to OS=, RepID= to GN= and then adding a PE= to the end of the header as just a placeholder. --Carson From: "Mack, Brian" Date: Monday, April 7, 2014 at 6:55 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] maker_functional_gff Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular ?-outfmt 6? option against the uniprot uniref-50. I put these results in the maker_functional_gff program using ?maker_functional_gff uniref-50 blastp-output maker.gff? but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I?m doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_functional_fasta Type: application/octet-stream Size: 3451 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_functional_gff Type: application/octet-stream Size: 4102 bytes Desc: not available URL: From darasappan at gmail.com Mon Apr 7 09:57:08 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 7 Apr 2014 10:57:08 -0500 Subject: [maker-devel] keep_preds parameter Message-ID: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Hello, I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? Thanks dhivya From dence at genetics.utah.edu Mon Apr 7 10:06:15 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 7 Apr 2014 16:06:15 +0000 Subject: [maker-devel] keep_preds parameter In-Reply-To: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: Hi Dhivya, That's a correct understanding of keep_preds, and it is a binary parameter; you either tell MAKER to keep the unsupported predictions or not to keep the unsupported predictions. In the output, you can tell which genes are supported by the _AED attribute in the gff3 file. Genes with and AED equal to zero have no support from the evidence sets (protein and EST and alt_EST). ~Daniel On Apr 7, 2014, at 9:57 AM, dhivya arasappan wrote: > Hello, > > I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? > > Thanks > dhivya > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From darasappan at gmail.com Mon Apr 7 10:31:55 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 7 Apr 2014 11:31:55 -0500 Subject: [maker-devel] keep_preds parameter In-Reply-To: References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: Thank you Daniel. But I thought an AED score of zero indicates complete agreement of annotation to evidence and that 1 would mean no agreement? Dhivya On Apr 7, 2014, at 11:06 AM, Daniel Ence wrote: > Hi Dhivya, > > That's a correct understanding of keep_preds, and it is a binary parameter; you either tell MAKER to keep the unsupported predictions or not to keep the unsupported predictions. In the output, you can tell which genes are supported by the _AED attribute in the gff3 file. Genes with and AED equal to zero have no support from the evidence sets (protein and EST and alt_EST). > > ~Daniel > On Apr 7, 2014, at 9:57 AM, dhivya arasappan > wrote: > >> Hello, >> >> I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? >> >> Thanks >> dhivya >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Mon Apr 7 10:33:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 10:33:59 -0600 Subject: [maker-devel] keep_preds parameter In-Reply-To: References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: True. Daniel had the numbers backwards (I often accidentally do that as well). --Carson On 4/7/14, 10:31 AM, "dhivya arasappan" wrote: >Thank you Daniel. But I thought an AED score of zero indicates complete >agreement of annotation to evidence and that 1 would mean no agreement? > >Dhivya > >On Apr 7, 2014, at 11:06 AM, Daniel Ence wrote: > >> Hi Dhivya, >> >> That's a correct understanding of keep_preds, and it is a binary >>parameter; you either tell MAKER to keep the unsupported predictions or >>not to keep the unsupported predictions. In the output, you can tell >>which genes are supported by the _AED attribute in the gff3 file. Genes >>with and AED equal to zero have no support from the evidence sets >>(protein and EST and alt_EST). >> >> ~Daniel >> On Apr 7, 2014, at 9:57 AM, dhivya arasappan >> wrote: >> >>> Hello, >>> >>> I?m looking for a little more explanation about keep_preds parameter. >>>The documentation says that it is a threshold to add unsupported gene >>>predictions. Along with some other changes, I set keep_preds=1 and saw >>>a huge jump in the number of genes I was getting. Is setting this >>>parameter to 1 equivalent to saying, include all predicted genes in my >>>output, even if they are not supported by my set or protein data? Is >>>there a way to tell from my output which genes are unsupported and >>>which are not? Also, are the only two options for this parameter 0 and >>>1? >>> >>> Thanks >>> dhivya >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nextgen.usfs at gmail.com Mon Apr 7 16:34:32 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Mon, 7 Apr 2014 17:34:32 -0500 Subject: [maker-devel] fasta_merge ARRAY error Message-ID: Hello, I?m getting an error when running fasta_merge as follows: Can't use an undefined value as an ARRAY reference at /home/ngs/maker/bin/fasta_merge line 116, line 1942. The result is that the fasta files are somewhat truncated, that is they do not match the gff3 file created from gff3_merge (which does run without any errors). Seems like it is getting stuck somewhere and then crashes. Is there another way to easily get the CDS out of the maker generated GFF file? Thanks, Jon From dence at genetics.utah.edu Mon Apr 7 19:23:07 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 8 Apr 2014 01:23:07 +0000 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: Hi Jon, Will you please send the command that gave you that error? Also, will you upload the maker control files you used and the gff3 file to the URL below? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 Also, which version of MAKER are you using? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS Ion PGM [nextgen.usfs at gmail.com] Sent: Monday, April 07, 2014 4:34 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] fasta_merge ARRAY error Hello, I?m getting an error when running fasta_merge as follows: Can't use an undefined value as an ARRAY reference at /home/ngs/maker/bin/fasta_merge line 116, line 1942. The result is that the fasta files are somewhat truncated, that is they do not match the gff3 file created from gff3_merge (which does run without any errors). Seems like it is getting stuck somewhere and then crashes. Is there another way to easily get the CDS out of the maker generated GFF file? Thanks, Jon _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Apr 7 20:02:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 20:02:30 -0600 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: What version of MAKER are you using, and did you run with the new trnascan option turned on? Basically the script is finding a fasta file for transcripts but the file for proteins is missing. Turning trnascan on can do this (obviously tRNAs can encode transcripts but don't encode proteins). The version of fasta_merge included in the current MAKER 2.31.3 download should handle this correctly. --Carson On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >Hi Jon, Will you please send the command that gave you that error? Also, >will you upload the maker control files you used and the gff3 file to the >URL below? > >http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 > >Also, which version of MAKER are you using? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS >Ion PGM [nextgen.usfs at gmail.com] >Sent: Monday, April 07, 2014 4:34 PM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] fasta_merge ARRAY error > >Hello, > >I?m getting an error when running fasta_merge as follows: > >Can't use an undefined value as an ARRAY reference at >/home/ngs/maker/bin/fasta_merge line 116, line 1942. > >The result is that the fasta files are somewhat truncated, that is they >do not match the gff3 file created from gff3_merge (which does run >without any errors). Seems like it is getting stuck somewhere and then >crashes. Is there another way to easily get the CDS out of the maker >generated GFF file? > >Thanks, > >Jon > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nextgen.usfs at gmail.com Tue Apr 8 06:56:22 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Tue, 8 Apr 2014 07:56:22 -0500 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Hi Carson and Daniel, I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so perhaps I should just get fasta_merge from 2.31.3 and give it a shot. But first to clarify, fasta_merge -d maker1_master_datastore_index.log - returns the appropriate files, however both the maker.all.proteins.fasta and maker.all.transcripts.fasta return 7401 with a grep command counting ?>?, while the gff3_merge -d maker1_master_datastore_index.log runs without failure and a grep command counting ?gene? returns 7525 models. I uploaded the files requested below. Thanks for the help. -Jon On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: > What version of MAKER are you using, and did you run with the new trnascan > option turned on? Basically the script is finding a fasta file for > transcripts but the file for proteins is missing. Turning trnascan on can > do this (obviously tRNAs can encode transcripts but don't encode > proteins). The version of fasta_merge included in the current MAKER > 2.31.3 download should handle this correctly. > > --Carson > > > > On 4/7/14, 7:23 PM, "Daniel Ence" wrote: > >> Hi Jon, Will you please send the command that gave you that error? Also, >> will you upload the maker control files you used and the gff3 file to the >> URL below? >> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >> >> Also, which version of MAKER are you using? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS >> Ion PGM [nextgen.usfs at gmail.com] >> Sent: Monday, April 07, 2014 4:34 PM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] fasta_merge ARRAY error >> >> Hello, >> >> I?m getting an error when running fasta_merge as follows: >> >> Can't use an undefined value as an ARRAY reference at >> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >> >> The result is that the fasta files are somewhat truncated, that is they >> do not match the gff3 file created from gff3_merge (which does run >> without any errors). Seems like it is getting stuck somewhere and then >> crashes. Is there another way to easily get the CDS out of the maker >> generated GFF file? >> >> Thanks, >> >> Jon >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Tue Apr 8 08:54:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 08 Apr 2014 08:54:05 -0600 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> References: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Message-ID: I've attached the fixed version (I see that the patched one is not in 2.31.3, but I'll get that taken care of). The tRNA genes will be in the maker.trnascan.transcripts.fasta. The other files will have only the coding genes. --Carson On 4/8/14, 6:56 AM, "USFS Ion PGM" wrote: >Hi Carson and Daniel, >I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so >perhaps I should just get fasta_merge from 2.31.3 and give it a shot. >But first to clarify, fasta_merge -d maker1_master_datastore_index.log - >returns the appropriate files, however both the maker.all.proteins.fasta >and maker.all.transcripts.fasta return 7401 with a grep command counting >?>?, while the gff3_merge -d maker1_master_datastore_index.log runs >without failure and a grep command counting ?gene? returns 7525 models. > >I uploaded the files requested below. Thanks for the help. > >-Jon > > >On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: > >> What version of MAKER are you using, and did you run with the new >>trnascan >> option turned on? Basically the script is finding a fasta file for >> transcripts but the file for proteins is missing. Turning trnascan on >>can >> do this (obviously tRNAs can encode transcripts but don't encode >> proteins). The version of fasta_merge included in the current MAKER >> 2.31.3 download should handle this correctly. >> >> --Carson >> >> >> >> On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >> >>> Hi Jon, Will you please send the command that gave you that error? >>>Also, >>> will you upload the maker control files you used and the gff3 file to >>>the >>> URL below? >>> >>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >>> >>> Also, which version of MAKER are you using? >>> >>> Thanks, >>> Daniel >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>USFS >>> Ion PGM [nextgen.usfs at gmail.com] >>> Sent: Monday, April 07, 2014 4:34 PM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] fasta_merge ARRAY error >>> >>> Hello, >>> >>> I?m getting an error when running fasta_merge as follows: >>> >>> Can't use an undefined value as an ARRAY reference at >>> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >>> >>> The result is that the fasta files are somewhat truncated, that is they >>> do not match the gff3 file created from gff3_merge (which does run >>> without any errors). Seems like it is getting stuck somewhere and then >>> crashes. Is there another way to easily get the CDS out of the maker >>> generated GFF file? >>> >>> Thanks, >>> >>> Jon >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: fasta_merge Type: application/octet-stream Size: 2977 bytes Desc: not available URL: From nextgen.usfs at gmail.com Tue Apr 8 10:01:18 2014 From: nextgen.usfs at gmail.com (Jon Palmer) Date: Tue, 08 Apr 2014 11:01:18 -0500 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Message-ID: <53441D4E.2070502@gmail.com> Thanks Carson, error is gone and is now working. Thanks for a great tool and for the fantastic support! -Jon On 04/08/2014 09:54 AM, Carson Holt wrote: > I've attached the fixed version (I see that the patched one is not in > 2.31.3, but I'll get that taken care of). > > The tRNA genes will be in the maker.trnascan.transcripts.fasta. The other > files will have only the coding genes. > > --Carson > > > > On 4/8/14, 6:56 AM, "USFS Ion PGM" wrote: > >> Hi Carson and Daniel, >> I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so >> perhaps I should just get fasta_merge from 2.31.3 and give it a shot. >> But first to clarify, fasta_merge -d maker1_master_datastore_index.log - >> returns the appropriate files, however both the maker.all.proteins.fasta >> and maker.all.transcripts.fasta return 7401 with a grep command counting >> ?>?, while the gff3_merge -d maker1_master_datastore_index.log runs >> without failure and a grep command counting ?gene? returns 7525 models. >> >> I uploaded the files requested below. Thanks for the help. >> >> -Jon >> >> >> On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: >> >>> What version of MAKER are you using, and did you run with the new >>> trnascan >>> option turned on? Basically the script is finding a fasta file for >>> transcripts but the file for proteins is missing. Turning trnascan on >>> can >>> do this (obviously tRNAs can encode transcripts but don't encode >>> proteins). The version of fasta_merge included in the current MAKER >>> 2.31.3 download should handle this correctly. >>> >>> --Carson >>> >>> >>> >>> On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >>> >>>> Hi Jon, Will you please send the command that gave you that error? >>>> Also, >>>> will you upload the maker control files you used and the gff3 file to >>>> the >>>> URL below? >>>> >>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >>>> >>>> Also, which version of MAKER are you using? >>>> >>>> Thanks, >>>> Daniel >>>> >>>> >>>> Daniel Ence >>>> Graduate Student >>>> Eccles Institute of Human Genetics >>>> University of Utah >>>> 15 North 2030 East, Room 2100 >>>> Salt Lake City, UT 84112-5330 >>>> ________________________________________ >>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>> USFS >>>> Ion PGM [nextgen.usfs at gmail.com] >>>> Sent: Monday, April 07, 2014 4:34 PM >>>> To: maker-devel at yandell-lab.org >>>> Subject: [maker-devel] fasta_merge ARRAY error >>>> >>>> Hello, >>>> >>>> I?m getting an error when running fasta_merge as follows: >>>> >>>> Can't use an undefined value as an ARRAY reference at >>>> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >>>> >>>> The result is that the fasta files are somewhat truncated, that is they >>>> do not match the gff3 file created from gff3_merge (which does run >>>> without any errors). Seems like it is getting stuck somewhere and then >>>> crashes. Is there another way to easily get the CDS out of the maker >>>> generated GFF file? >>>> >>>> Thanks, >>>> >>>> Jon >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> From sjackman at gmail.com Tue Apr 8 13:21:38 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 8 Apr 2014 12:21:38 -0700 Subject: [maker-devel] Changing rmlib runs RepeatRunner Message-ID: Changing `rmlib` causes not just RepeatMasker to be rerun, but also RepeatRunner. Is the latter necessary? Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 8 14:00:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 08 Apr 2014 14:00:11 -0600 Subject: [maker-devel] Changing rmlib runs RepeatRunner In-Reply-To: References: Message-ID: RepeatRunner runs on what was not masked by RepeatMasker, so changing rmlib can cause RepeatRunner to give slightly different results because RepeatMasker results changed. --Carson From: Shaun Jackman Date: Tuesday, April 8, 2014 at 1:21 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Changing rmlib runs RepeatRunner Changing `rmlib` causes not just RepeatMasker to be rerun, but also RepeatRunner. Is the latter necessary? Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 12:34:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 11:34:34 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER Message-ID: The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA and CDS features, but it does not include gene or exon features, and so it does not appear to be working with MAKER. Has anyone else used GlimmerHMM with MAKER, and how did you deal with this issue? Cheers, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 12:53:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 12:53:55 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match for mRNA and match_part for CDS. Then it will be interpreted as a two level alignments feature which can be given to pred_gff. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 12:34 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Using GlimmerHMM with MAKER The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA and CDS features, but it does not include gene or exon features, and so it does not appear to be working with MAKER. Has anyone else used GlimmerHMM with MAKER, and how did you deal with this issue? Cheers, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 15:32:55 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 14:32:55 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: Thanks, Carson. That helps. I'm trying to do a completely ab initio gene annotation without any est or protein homology evidence, at least for now. The GFF file produce by maker is empty. How do I carry the GlimmerHMM pred_gff (or model_gff) annotations through to the end? Ultimately, I'd like to merge annotations from multiple ab initio predictions. Cheers, Shaun On 10 April 2014 11:53, Carson Holt wrote: > Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match > for mRNA and match_part for CDS. Then it will be interpreted as a two > level alignments feature which can be given to pred_gff. > > --Carson > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 12:34 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Using GlimmerHMM with MAKER > > The GlimmerHMM gene prediction software outputs a GFF file that includes > mRNA and CDS features, but it does not include gene or exon features, and > so it does not appear to be working with MAKER. Has anyone else used > GlimmerHMM with MAKER, and how did you deal with this issue? > > Cheers, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 15:35:17 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 15:35:17 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: keep_preds=1 will force MAKER to keep ab initio results even if their is no evidence supporting them. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 3:32 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Using GlimmerHMM with MAKER Thanks, Carson. That helps. I'm trying to do a completely ab initio gene annotation without any est or protein homology evidence, at least for now. The GFF file produce by maker is empty. How do I carry the GlimmerHMM pred_gff (or model_gff) annotations through to the end? Ultimately, I'd like to merge annotations from multiple ab initio predictions. Cheers, Shaun On 10 April 2014 11:53, Carson Holt wrote: > Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match for > mRNA and match_part for CDS. Then it will be interpreted as a two level > alignments feature which can be given to pred_gff. > > --Carson > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 12:34 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Using GlimmerHMM with MAKER > > The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA > and CDS features, but it does not include gene or exon features, and so it > does not appear to be working with MAKER. Has anyone else used GlimmerHMM with > MAKER, and how did you deal with this issue? > > Cheers, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 16:51:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 15:51:34 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: That worked! Thanks again, Carson. A note for the record: I found that keep_preds=1 carries forward pred_gffannotations, but not model_gff annotations when that GFF file uses match and match_partannotations (like a munged GlimmerHMM GFF file), which makes sense I guess now that I think about it. Cheers, Shaun On 10 April 2014 14:35, Carson Holt wrote: > keep_preds=1 will force MAKER to keep ab initio results even if their is > no evidence supporting them. > > --Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 3:32 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Using GlimmerHMM with MAKER > > Thanks, Carson. That helps. I'm trying to do a completely ab initio gene > annotation without any est or protein homology evidence, at least for now. > The GFF file produce by maker is empty. How do I carry the GlimmerHMM > pred_gff (or model_gff) annotations through to the end? Ultimately, I'd > like to merge annotations from multiple ab initio predictions. > > Cheers, > Shaun > > > On 10 April 2014 11:53, Carson Holt wrote: > >> Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute >> match for mRNA and match_part for CDS. Then it will be interpreted as a >> two level alignments feature which can be given to pred_gff. >> >> --Carson >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Thursday, April 10, 2014 at 12:34 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] Using GlimmerHMM with MAKER >> >> The GlimmerHMM gene prediction software outputs a GFF file that includes >> mRNA and CDS features, but it does not include gene or exon features, and >> so it does not appear to be working with MAKER. Has anyone else used >> GlimmerHMM with MAKER, and how did you deal with this issue? >> >> Cheers, >> Shaun >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 16:55:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 16:55:07 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: The model_gff option can only take gene/mRNA/exon/CDS features, and will ignore match/match_part features. It's a little more restrictive. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 4:51 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Using GlimmerHMM with MAKER model_gff -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbharris at uw.edu Mon Apr 14 19:45:13 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Mon, 14 Apr 2014 18:45:13 -0700 Subject: [maker-devel] empty genome.ann/genome.dna Message-ID: Hi, I recently set up MAKER on a new computer and am having trouble running a dataset that was run successfully on a different computer. After MAKER is finished, I ran gff3_merge and maker2zff and it returns empty genome.ann and genome.dna files. I have tried installing older versions of dependencies and have tinkered with the control files but I still can't figure out what the issue is. The only difference I can find is that the .all.gff file from a successfully run file has lines at the beginning of the file reporting the success of exonerate. On the failing version of maker - these are not reported - it just goes strait to fasta output. However, exonerate appears to work successfully when run outside of the maker pipeline. Any help would be greatly appreciated. Thanks! Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 15 09:33:45 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Apr 2014 09:33:45 -0600 Subject: [maker-devel] empty genome.ann/genome.dna In-Reply-To: References: Message-ID: Could you upload your control files and job input files here--> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi I'll take a look to see if there is any problem with your job's setup. Also what version of MAKER are you running? --Carson From: Rebecca Harris Date: Monday, April 14, 2014 at 7:45 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] empty genome.ann/genome.dna Hi, I recently set up MAKER on a new computer and am having trouble running a dataset that was run successfully on a different computer. After MAKER is finished, I ran gff3_merge and maker2zff and it returns empty genome.ann and genome.dna files. I have tried installing older versions of dependencies and have tinkered with the control files but I still can't figure out what the issue is. The only difference I can find is that the .all.gff file from a successfully run file has lines at the beginning of the file reporting the success of exonerate. On the failing version of maker - these are not reported - it just goes strait to fasta output. However, exonerate appears to work successfully when run outside of the maker pipeline. Any help would be greatly appreciated. Thanks! Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Tue Apr 15 11:01:37 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Tue, 15 Apr 2014 13:01:37 -0400 Subject: [maker-devel] passing names from a gff to new predictions Message-ID: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Hello I have an interesting issue with an existing Maker gff. I have a gff file with human friendly names that I would like to pass to the new predictions. However, some of those genes in the human friendly gff file are incorrect or have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 the names move but so do the incorrect models. Maker simply duplicates these predictions to the new outputs. If I remove the GFF file from the ctl file I get new predictions, that have the necessary corrections but they now have unfriendly names. Do you have any suggestions on how to associate the old names with the new predictions? I could simple blast the old proteins vs the new ones and associate them in that manor but I was wondering if there were any other options within Maker. Since I have the GFF files I also have the associated transcripts and proteins. Do I need to do some iteration of est2/genome then generate a new model gff file? The issue we are dealing with is thousands of short introns in our gff file. These are less than 20 bp and are not biologically feasible so we are trying to correct the gene model predictions. Cheers Ian From carsonhh at gmail.com Tue Apr 15 11:31:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Apr 2014 11:31:35 -0600 Subject: [maker-devel] passing names from a gff to new predictions In-Reply-To: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> References: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Message-ID: If you give anything to pred_gff or model_gff then it is allowed to compete as a predictor and thus can end up in the final results. You stated that the models you are passing in have errors, and you don't want them to be allowed to compete and end up in your final models? Correct. MAKER is not made to expect erroneous input, so I don't have an easy solution for you (I do have a less easy solution though; but you will need to do some editing of the MAKER code). 1. Open .../maker/lib/maker/auto_annotator.pm in an editor like emacs or vi. 2. Search for the 'best_annotations' subroutine (around line 1248 depending on which version of MAKER you have). 3. Then edit it as follows: This is how the top section of the subroutine should look at first --> sub best_annotations { my $annotations = shift; my $CTL_OPT = shift; my @predictors = @{$CTL_OPT->{_predictor}}; ... Change it to this --> sub best_annotations { my $annotations = shift; my $CTL_OPT = shift; my @predictors = grep {!/model_gff/} @{$CTL_OPT->{_predictor}}; ... Now run maker again with your old GFF3 file as input to model_gff, and just remember to change the MAKER code back to the way it was when your done with everything. Basically the change will hard filter model_gff results from being allowed into your final annotations. So names will still move from model_gff to your final results with the map_forward=1 option but none of the old models will make it as gene/mRNA/exon/CDS features in the final GFF3 (they will still be listed as match/match_part reference features though). Thanks, Carson On 4/15/14, 11:01 AM, "UMD Bioinformatics" wrote: > Hello > > I have an interesting issue with an existing Maker gff. I have a gff file with > human friendly names that I would like to pass to the new predictions. > However, some of those genes in the human friendly gff file are incorrect or > have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 > the names move but so do the incorrect models. Maker simply duplicates these > predictions to the new outputs. If I remove the GFF file from the ctl file I > get new predictions, that have the necessary corrections but they now have > unfriendly names. Do you have any suggestions on how to associate the old > names with the new predictions? I could simple blast the old proteins vs the > new ones and associate them in that manor but I was wondering if there were > any other options within Maker. > > Since I have the GFF files I also have the associated transcripts and > proteins. > Do I need to do some iteration of est2/genome then generate a new model gff > file? > > The issue we are dealing with is thousands of short introns in our gff file. > These are less than 20 bp and are not biologically feasible so we are trying > to correct the gene model predictions. > > Cheers > Ian > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Tue Apr 15 11:54:00 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Tue, 15 Apr 2014 13:54:00 -0400 Subject: [maker-devel] passing names from a gff to new predictions In-Reply-To: References: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Message-ID: <31BC21FD-D9D6-4B66-B0D7-C48FBC3B7A98@gmail.com> Carson, That seems to fix this issue. Thanks for the insight not something I would have ever come up with. Cheers Ian On Apr 15, 2014, at 1:31 PM, Carson Holt wrote: > If you give anything to pred_gff or model_gff then it is allowed to compete as a predictor and thus can end up in the final results. You stated that the models you are passing in have errors, and you don't want them to be allowed to compete and end up in your final models? Correct. > > MAKER is not made to expect erroneous input, so I don't have an easy solution for you (I do have a less easy solution though; but you will need to do some editing of the MAKER code). > > Open .../maker/lib/maker/auto_annotator.pm in an editor like emacs or vi. > Search for the 'best_annotations' subroutine (around line 1248 depending on which version of MAKER you have). > Then edit it as follows: > > This is how the top section of the subroutine should look at first --> > > sub best_annotations { > my $annotations = shift; > my $CTL_OPT = shift; > > my @predictors = @{$CTL_OPT->{_predictor}}; > > ... > > Change it to this --> > > sub best_annotations { > my $annotations = shift; > my $CTL_OPT = shift; > > my @predictors = grep {!/model_gff/} @{$CTL_OPT->{_predictor}}; > > ... > > > > Now run maker again with your old GFF3 file as input to model_gff, and just remember to change the MAKER code back to the way it was when your done with everything. Basically the change will hard filter model_gff results from being allowed into your final annotations. So names will still move from model_gff to your final results with the map_forward=1 option but none of the old models will make it as gene/mRNA/exon/CDS features in the final GFF3 (they will still be listed as match/match_part reference features though). > > Thanks, > Carson > > > > On 4/15/14, 11:01 AM, "UMD Bioinformatics" wrote: > >> Hello >> >> I have an interesting issue with an existing Maker gff. I have a gff file with human friendly names that I would like to pass to the new predictions. However, some of those genes in the human friendly gff file are incorrect or have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 the names move but so do the incorrect models. Maker simply duplicates these predictions to the new outputs. If I remove the GFF file from the ctl file I get new predictions, that have the necessary corrections but they now have unfriendly names. Do you have any suggestions on how to associate the old names with the new predictions? I could simple blast the old proteins vs the new ones and associate them in that manor but I was wondering if there were any other options within Maker. >> >> Since I have the GFF files I also have the associated transcripts and proteins. >> Do I need to do some iteration of est2/genome then generate a new model gff file? >> >> The issue we are dealing with is thousands of short introns in our gff file. These are less than 20 bp and are not biologically feasible so we are trying to correct the gene model predictions. >> >> Cheers >> Ian >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.king at rothamsted.ac.uk Wed Apr 16 05:27:09 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Wed, 16 Apr 2014 11:27:09 +0000 Subject: [maker-devel] scalar text in maker transcripts Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7C8DAC@rothex1.rothamsted.ac.uk> Hi, I've got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 16 15:56:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Apr 2014 15:56:25 -0600 Subject: [maker-devel] scalar text in maker transcripts Message-ID: The only time I have seen this is when fgenesh is used as a predictor and correct_est_fusion=1 is set (it was a bug in trimming long UTR's on fgenesh models). Is that how you have your job configured? If so, that particular bug is fixed in the current MAKER release. Thanks, Carson From: "Robert King (RRes-Roth)" Date: Wednesday, April 16, 2014 at 5:27 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] scalar text in maker transcripts Hi, I?ve got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner , and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.king at rothamsted.ac.uk Wed Apr 16 15:57:44 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Wed, 16 Apr 2014 21:57:44 +0000 Subject: [maker-devel] scalar text in maker transcripts In-Reply-To: <26314411-75c8-484f-9fbf-413e37d1c706@ROTHEX1.rothamsted.ac.uk> References: <26314411-75c8-484f-9fbf-413e37d1c706@ROTHEX1.rothamsted.ac.uk> Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7C8E85@rothex1.rothamsted.ac.uk> Yep I am. I?ll try upgrading. Thanks Rob From: Carson Holt [mailto:carsonhh at gmail.com] Sent: 16 April 2014 22:56 To: Robert King (RRes-Roth); maker-devel at yandell-lab.org Subject: Re: [maker-devel] scalar text in maker transcripts The only time I have seen this is when fgenesh is used as a predictor and correct_est_fusion=1 is set (it was a bug in trimming long UTR's on fgenesh models). Is that how you have your job configured? If so, that particular bug is fixed in the current MAKER release. Thanks, Carson From: "Robert King (RRes-Roth)" > Date: Wednesday, April 16, 2014 at 5:27 AM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] scalar text in maker transcripts Hi, I?ve got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From muriel.grosb at gmail.com Mon Apr 7 06:29:42 2014 From: muriel.grosb at gmail.com (Muriel Gros-Balthazard) Date: Mon, 7 Apr 2014 14:29:42 +0200 Subject: [maker-devel] Help for Repeat Library Construction Message-ID: <474C2DF8-B5DF-424B-BCF7-EC64BC23EEDC@gmail.com> Hello, I am working on the annotation of the date palm genome using the MAKER pipeline. I started by following the manual for Repeat Library Construction - Advanced. I am stuck in 2.1.3. Indeed, I should use muscle to filter. But I don?t understand what is the file flankingseqfile. How can I obtain it ? Also, do you hava more information about 2.1.4 and 2.1.5 ? Thanks a lot for this great pipeline and for your help, Muriel Gros-Balthazard From Brian.Mack at ARS.USDA.GOV Thu Apr 17 14:34:21 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Thu, 17 Apr 2014 20:34:21 +0000 Subject: [maker-devel] tbl2asn errors Message-ID: Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 - 441931, but the sqn file will have 440868 - 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 17 14:59:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Apr 2014 14:59:05 -0600 Subject: [maker-devel] tbl2asn errors Message-ID: The only one that may be a real error is the first one (I'm not sure what it means). You probably need to find them and open them in a viewer like apollo. The rest I would consider warnings (the NCBI tool doesn't like any weirdness or uncertainty). You often have to manually edit things to get NCBI to accept all models without complaining (sometimes even going against real biology). I know some groups use the always_complete=1 option in MAKER to force start and stop codons into every model for example (even though those forced codons are probably false). *Not sure about this one --> 4 ERROR: SEQ_FEAT.BadTrailingCharacter *These are partial genes with no stop (usually happen at the edge of contigs or near strings of NNNN) --> 217 ERROR: SEQ_FEAT.NoStop *These are just short introns (intron size is under control of the ab initio predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron *These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR: SEQ_FEAT.StartCodon *These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR: SEQ_INST.BadProteinStart *Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) --> 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor *Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) --> 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor *These are just short exons (exon size is under control of the ab initio predictors) --> 118 WARNING: SEQ_FEAT.ShortExon You probably need to identify examples of models causing each issue, and then look at the in Apollo. Apollo lets you open tbl format and save back to it. I imagine the coordinate change is from NCBI using a 0 based coordinate system as opposed to a 1 based system (I.e. first base is 0 rather than 1). Unfortunately getting everything to go into NCBI is usually a grueling task. --Carson From: "Mack, Brian" Date: Thursday, April 17, 2014 at 2:34 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] tbl2asn errors Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475 ). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Geib at ARS.USDA.GOV Thu Apr 17 14:59:22 2014 From: Scott.Geib at ARS.USDA.GOV (Geib, Scott) Date: Thu, 17 Apr 2014 20:59:22 +0000 Subject: [maker-devel] tbl2asn errors In-Reply-To: References: Message-ID: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian > Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " > Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 17 15:27:53 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Apr 2014 15:27:53 -0600 Subject: [maker-devel] tbl2asn errors In-Reply-To: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> References: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Message-ID: Very cool. I'll try it out as well. --Carson From: "Geib, Scott" Date: Thursday, April 17, 2014 at 2:59 PM To: "Mack, Brian" , "maker-devel at yandell-lab.org" , "Brian Hall (bhall7 at hawaii.edu)" Subject: Re: [maker-devel] tbl2asn errors Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475 ). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Geib at ARS.USDA.GOV Thu Apr 17 16:37:49 2014 From: Scott.Geib at ARS.USDA.GOV (Geib, Scott) Date: Thu, 17 Apr 2014 22:37:49 +0000 Subject: [maker-devel] tbl2asn errors In-Reply-To: References: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Message-ID: <0D54878997A4B9478F03938D61DB51D4266C1E@001FSN2MPN1-015.001f.mgd2.msft.net> Just so not to be discouraged, current version has limited functionality and is pretty much un-documented (although will write a .tbl file). Will email the list when first real release is complete and documented. Scott From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, April 17, 2014 11:28 AM To: Geib, Scott; Mack, Brian; maker-devel at yandell-lab.org; Brian Hall (bhall7 at hawaii.edu) Subject: Re: [maker-devel] tbl2asn errors Very cool. I'll try it out as well. --Carson From: "Geib, Scott" > Date: Thursday, April 17, 2014 at 2:59 PM To: "Mack, Brian" >, "maker-devel at yandell-lab.org" >, "Brian Hall (bhall7 at hawaii.edu)" > Subject: Re: [maker-devel] tbl2asn errors Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian > Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " > Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Fri Apr 18 07:14:45 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Fri, 18 Apr 2014 09:14:45 -0400 Subject: [maker-devel] Short Introns Message-ID: Hello, We are preparing two submission for NCBI, nightmare. However some of our MAKER gene models have short introns that are being flagged by NCBI. In one species we have >400 introns smaller then 20bp which is almost biologically impossible. I know we can set max intron length in the opts.ctl file but can we set a minimum intron length? I saw yesterdays posts that mention this is a result of the external ab initio predictors but I didn?t see an indication as to which predictor and how to change that setting. from yesterday: *These are just short introns (intron size is under control of the ab initio predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron Cheers Ian From carsonhh at gmail.com Fri Apr 18 09:35:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 18 Apr 2014 09:35:51 -0600 Subject: [maker-devel] Short Introns In-Reply-To: References: Message-ID: Look at the name of those genes. The original name will let you know where it came from because it will contain, augustus, genemark, snap, etc. You will also want to open up the contig containing those geens in a viewer like apollo (http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz). See if the short intron is part of the CDS or UTR. If it's UTR then, it has evidence support from an EST, which either means there are problems with the EST/cDNA evidence or it's real. For those, even if they are real you can just trim them off. If it's part of the CDS, then investigate whether it is suggested by EST or protein evidence, or if the ab initio predictor called it (sometime the ab initio predictor calls things to force an ORF to work). This can sometimes be indicative of assembly issues in that region. --Carson On 4/18/14, 7:14 AM, "UMD Bioinformatics" wrote: >Hello, > >We are preparing two submission for NCBI, nightmare. However some of our >MAKER gene models have short introns that are being flagged by NCBI. In >one species we have >400 introns smaller then 20bp which is almost >biologically impossible. I know we can set max intron length in the >opts.ctl file but can we set a minimum intron length? > >I saw yesterdays posts that mention this is a result of the external ab >initio predictors but I didn?t see an indication as to which predictor >and how to change that setting. > >from yesterday: >*These are just short introns (intron size is under control of the ab >initio >predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron > >Cheers >Ian > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.seidl at wur.nl Tue Apr 22 08:27:18 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:27:18 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Message-ID: Hi, I have a question on the post-processing of my maker output. I finished a maker run on a draft genome (231 scaffolds) without an error. To get a merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. However, I realized that I contains next to gff3 conform output, thousands of lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced scaffolds is correct, however I have my doubts if I successfully retrieved all annotations...Could you maybe point me to a possible solution... Thanks in advance Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:31:16 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:31:16 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: Message-ID: I've never seen this. What version of MAKER are you using? --Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:27 AM To: Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi, I have a question on the post-processing of my maker output. I finished a maker run on a draft genome (231 scaffolds) without an error. To get a merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. However, I realized that I contains next to gff3 conform output, thousands of lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced scaffolds is correct, however I have my doubts if I successfully retrieved all annotations...Could you maybe point me to a possible solution... Thanks in advance Michael _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.seidl at wur.nl Tue Apr 22 08:37:33 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:37:33 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Hi Carson, I am using maker 2.31. Thanks Michael On Tue, Apr 22, 2014 at 4:31 PM, Carson Holt wrote: > I've never seen this. What version of MAKER are you using? > > --Carson > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:27 AM > To: > > Subject: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi, > > I have a question on the post-processing of my maker output. I finished a > maker run on a draft genome (231 scaffolds) without an error. To get a > merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d > master_datastore_index.log. However, I realized that I contains next to > gff3 conform output, thousands of lines of array refs, e.g. > ARRAY(0x188a8578)). The total number of produced scaffolds is correct, > however I have my doubts if I successfully retrieved all > annotations...Could you maybe point me to a possible solution... > > Thanks in advance > Michael > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- *Michael F Seidl, PhD* Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:39:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:39:51 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Could you check the individual contig GFF3's before merge. Do any of those contain array refs? Also is it exactly 2.31 or the current 2.31.3? --Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi Carson, I am using maker 2.31. Thanks Michael On Tue, Apr 22, 2014 at 4:31 PM, Carson Holt wrote: > I've never seen this. What version of MAKER are you using? > > --Carson > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:27 AM > To: > > Subject: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi, > > I have a question on the post-processing of my maker output. I finished a > maker run on a draft genome (231 scaffolds) without an error. To get a merged > gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. > However, I realized that I contains next to gff3 conform output, thousands of > lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced > scaffolds is correct, however I have my doubts if I successfully retrieved all > annotations...Could you maybe point me to a possible solution... > > Thanks in advance > Michael > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.seidl at wur.nl Tue Apr 22 08:43:44 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:43:44 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt wrote: > any Dear Carson, maker -version returns 2.31. Yes, also the individual scaffolds seem to contain ARRAY refs, e.g. find -name "*gff" | xargs grep "ARRAY": ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 Cheers M -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:46:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:46:34 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:43 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt wrote: > any Dear Carson, maker -version returns 2.31. Yes, also the individual scaffolds seem to contain ARRAY refs, e.g. find -name "*gff" | xargs grep "ARRAY": ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 Cheers M -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Tue Apr 22 11:45:45 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Tue, 22 Apr 2014 23:15:45 +0530 Subject: [maker-devel] is using est_reads option safe? Message-ID: Hi, I need to run MAKER against a genome with both raw (FASTQ) and assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using est= options in maker_opts.ctl. Looking for how to point MAKER to the raw reads I came across this thread https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where Dr. Carlson Holt points out that est_gff should be used. However, from MAKER's run log it seems that est_reads option is not deprecated, just hidden from plain sight by excluding it from maker_opts.ctl. So I set est_reads option in maker_opts.ctl and MAKER parses the control files and runs just fine. Now I am left wondering if it's safe to use est_reads. As in, could it impact the predicted set negatively? -- Priyam From carsonhh at gmail.com Tue Apr 22 12:02:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 12:02:56 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: The est_reads option doesn't do anything. It in the run log for backwards compatibility with old jobs because MAKER has a restart capability (i.e. people can rerun new MAKER versions against old MAKER output in the same directory - it can reuse old raw results to avoid rerunning analysis steps). The est_reads was originally there for developer experimentation, but then it went away. You need to use an external tool like tophat and cufflinks to align short reads and assemble them into likely exon blocks (i.e. the GFF3 passthrough option you mentioned). Or you can assemble then without alignment using something like trinity (then you can provide that result to the est= options because it will be in fasta format). You should not use raw reads directly with MAKER, you need to preprocess them using one of the methods mentioned for them to be useful. Thanks, Carson On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >Hi, > >I need to run MAKER against a genome with both raw (FASTQ) and >assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >est= options in maker_opts.ctl. Looking for how to point MAKER to the >raw reads I came across this thread >https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >Dr. Carlson Holt points out that est_gff should be used. However, from >MAKER's run log it seems that est_reads option is not deprecated, just >hidden from plain sight by excluding it from maker_opts.ctl. So I set >est_reads option in maker_opts.ctl and MAKER parses the control files >and runs just fine. > >Now I am left wondering if it's safe to use est_reads. As in, could it >impact the predicted set negatively? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Apr 22 13:10:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 13:10:46 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> <155dca02dbb84844930703f598f57635@SCOMP0939.wurnet.nl> Message-ID: The issue was indeed caused by a bug in using the other_gff= file option. Could you place the attached file in .../maker/lib/. Then you can rerun maker to test if it fixes it ('maker -a' for fast rerun without analysis rerun). Alternately if you don't feel like rerunning everything, you can also filter out the lines using --> grep -v "ARRAY" file.gff Since the other_gff file is not used in any part of the analysis and is just a convenience option that prints any text given to it into the final GFF3 file, then filtering them out is the same as if you would have left other_gff blank when running MAKER. You can then use 'gff3_merge -s tophat.gff merged_genome.gff' to merge the desired extra lines back into your file outside of MAKER. Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 12:29 PM To: Carson Holt Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi Carson, I uploaded the files as an archive. Thanks Michael On Tue, Apr 22, 2014 at 5:04 PM, Carson Holt wrote: > In the base maker.output directory for the job, there will be a file with a > .db extension. Could you send that as well? I'm leaning towards this being > something odd happening with the GFF3 files used as input. Particularly the > other_gff= file. Could you upload this file as well --> > /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/toph > at.gff3. > > --Carson > > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:56 AM > To: Carson Holt > > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Should be uploading right now... > > Thanks Michael > > > > On Tue, Apr 22, 2014 at 4:46 PM, Carson Holt > > wrote: > Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it > here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > Thanks, > Carson > > > From: Michael Seidl > >> > Date: Tuesday, April 22, 2014 at 8:43 AM > To: Carson Holt > o:carsonhh at gmail.com>>> > Cc: > "maker-devel at yandell-lab.org devel at yandell-lab.org>" > devel at yandell-lab.org>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > > On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt > o:carsonhh at gmail.com>>> wrote: > any > > Dear Carson, > > maker -version returns 2.31. Yes, also the individual scaffolds seem to > contain ARRAY refs, e.g. > find -name "*gff" | xargs grep "ARRAY": > > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 > > Cheers > M > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: michael.seidl at wur.nl > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GFFDB.pm Type: text/x-perl-script Size: 52152 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 22 14:35:31 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 14:35:31 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> <155dca02dbb84844930703f598f57635@SCOMP0939.wurnet.nl> Message-ID: You can provide a comma separated list of files to est_gff. Also from experience cufflinks gives far better results than tophat. Tophat tends to have a lot of false positives that adversely affect the overall quality of gene models, so I usually recommend that people use cufflinks output and not even include the tophat results in their run. Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 2:30 PM To: Carson Holt Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Dear Carson, thanks a lot I will try. More importantly, you pointed me to a mistake in my procedure which will make me rerun the maker anyway :p I want maker to use the tophat.gff next to cufflinks est (fa + gff) as well as a protein.fa. I provide them currently as follows: #-----EST Evidence (for best results provide a file for at least one) est= /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/tr anscripts.cds.fa #set of ESTs or assembled mRNA-seq altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/tr anscripts.gff3 #aligned ESTs or mRNA-seq from a altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= /home/michael/data/side/alternaria/maker_annotation/fungal_proteins.fa #protein sequence file in fasta format (i.e. from mu protein_gff= #aligned protein homology evidence from an external GFF3 file Can I give the tophat.gff as a alttest.gff or is maker internally using est_gff and altest_gff differently? Sorry for this question, but I did not yet realized that the other_gff will be omitted during maker Thanks a lot Michael On Tue, Apr 22, 2014 at 9:10 PM, Carson Holt wrote: > The issue was indeed caused by a bug in using the other_gff= file option. > Could you place the attached file in .../maker/lib/. Then you can rerun maker > to test if it fixes it ('maker -a' for fast rerun without analysis rerun). > > Alternately if you don't feel like rerunning everything, you can also filter > out the lines using --> grep -v "ARRAY" file.gff > > Since the other_gff file is not used in any part of the analysis and is just a > convenience option that prints any text given to it into the final GFF3 file, > then filtering them out is the same as if you would have left other_gff blank > when running MAKER. You can then use 'gff3_merge -s tophat.gff > merged_genome.gff' to merge the desired extra lines back into your file > outside of MAKER. > > Thanks, > Carson > > > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 12:29 PM > To: Carson Holt > > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi Carson, > > I uploaded the files as an archive. > > Thanks > Michael > > > On Tue, Apr 22, 2014 at 5:04 PM, Carson Holt > > wrote: > In the base maker.output directory for the job, there will be a file with a > .db extension. Could you send that as well? I'm leaning towards this being > something odd happening with the GFF3 files used as input. Particularly the > other_gff= file. Could you upload this file as well --> > /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/toph > at.gff3. > > --Carson > > > From: Michael Seidl > >> > Date: Tuesday, April 22, 2014 at 8:56 AM > To: Carson Holt > o:carsonhh at gmail.com>>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Should be uploading right now... > > Thanks Michael > > > > On Tue, Apr 22, 2014 at 4:46 PM, Carson Holt > o:carsonhh at gmail.com>>> wrote: > Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it > here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > Thanks, > Carson > > > From: Michael Seidl > > l at wur.nl>>>> > Date: Tuesday, April 22, 2014 at 8:43 AM > To: Carson Holt > o:carsonhh at gmail.com>> ilto:carsonhh at gmail.com>>> > Cc: > "maker-devel at yandell-lab.org devel at yandell-lab.org> yandell-lab.org -lab.org>>" > devel at yandell-lab.org> yandell-lab.org -lab.org>>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > > On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt > o:carsonhh at gmail.com>> ilto:carsonhh at gmail.com>>> wrote: > any > > Dear Carson, > > maker -version returns 2.31. Yes, also the individual scaffolds seem to > contain ARRAY refs, e.g. > find -name "*gff" | xargs grep "ARRAY": > > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 > > Cheers > M > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: > michael.seidl at wur.nl mailto:michael.seidl at wur.nl>> > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: michael.seidl at wur.nl > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Wed Apr 23 03:55:37 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Wed, 23 Apr 2014 15:25:37 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: Thanks, Carson. I now understand that I shouldn't use est_reds options. Does MAKER utilise est_gff for prediction or simply passes the annotations through to the output GFF? In that case how is it different from using other_gff / model_gff (what's the difference between these two?) I have both assembled and raw reads. Is it sufficient to just use the assembled set? -- Priyam On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: > The est_reads option doesn't do anything. It in the run log for backwards > compatibility with old jobs because MAKER has a restart capability (i.e. > people can rerun new MAKER versions against old MAKER output in the same > directory - it can reuse old raw results to avoid rerunning analysis > steps). The est_reads was originally there for developer experimentation, > but then it went away. > > You need to use an external tool like tophat and cufflinks to align short > reads and assemble them into likely exon blocks (i.e. the GFF3 passthrough > option you mentioned). Or you can assemble then without alignment using > something like trinity (then you can provide that result to the est= > options because it will be in fasta format). > > You should not use raw reads directly with MAKER, you need to preprocess > them using one of the methods mentioned for them to be useful. > > Thanks, > Carson > > > > On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: > >>Hi, >> >>I need to run MAKER against a genome with both raw (FASTQ) and >>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>raw reads I came across this thread >>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>Dr. Carlson Holt points out that est_gff should be used. However, from >>MAKER's run log it seems that est_reads option is not deprecated, just >>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>est_reads option in maker_opts.ctl and MAKER parses the control files >>and runs just fine. >> >>Now I am left wondering if it's safe to use est_reads. As in, could it >>impact the predicted set negatively? >> >>-- Priyam >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Wed Apr 23 08:43:54 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 23 Apr 2014 08:43:54 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: est_gff is the equivalent of est=, but because the alignment structure is already in the GFF3, I don't need to align sequence with blastn/exonerate. model_gff and pred_gff are essentially the same with the difference being that model_gff can be kept in the final results even without supporting evidence, but pred_gff won't. Pred_gff needs evidence support because it is a potential model, where model_gff is considered a known model even if the structure of that model may be uncertain. other_gff is just a convenience method for passing through GFF3 features to the final result. It's impossible to have MAKER be aware of every kind of possible entry, so if you have something more exotic in the final output (sequence variant information, alternate alleles, promotor and methylation site, etc.) then you can pass it in there and it will just be printed into the file. It's basically the equivalent of concatenating two GFF3 files together, but it handles the proper reordering of sequence information at the end of the GFF3 file (because technically you can't just concatenate GFF3 files end-to-end). You can also use the gff3_merge tool that comes with MAKER to get the same effect. --Carson On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >Thanks, Carson. > >I now understand that I shouldn't use est_reds options. > >Does MAKER utilise est_gff for prediction or simply passes the >annotations through to the output GFF? In that case how is it >different from using other_gff / model_gff (what's the difference >between these two?) > >I have both assembled and raw reads. Is it sufficient to just use the >assembled set? > >-- Priyam > >On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: >> The est_reads option doesn't do anything. It in the run log for >>backwards >> compatibility with old jobs because MAKER has a restart capability (i.e. >> people can rerun new MAKER versions against old MAKER output in the same >> directory - it can reuse old raw results to avoid rerunning analysis >> steps). The est_reads was originally there for developer >>experimentation, >> but then it went away. >> >> You need to use an external tool like tophat and cufflinks to align >>short >> reads and assemble them into likely exon blocks (i.e. the GFF3 >>passthrough >> option you mentioned). Or you can assemble then without alignment using >> something like trinity (then you can provide that result to the est= >> options because it will be in fasta format). >> >> You should not use raw reads directly with MAKER, you need to preprocess >> them using one of the methods mentioned for them to be useful. >> >> Thanks, >> Carson >> >> >> >> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >> >>>Hi, >>> >>>I need to run MAKER against a genome with both raw (FASTQ) and >>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>raw reads I came across this thread >>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>MAKER's run log it seems that est_reads option is not deprecated, just >>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>and runs just fine. >>> >>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>impact the predicted set negatively? >>> >>>-- Priyam >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> From kdelmore at zoology.ubc.ca Tue Apr 22 22:48:08 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 22 Apr 2014 21:48:08 -0700 Subject: [maker-devel] problem with dsindex Message-ID: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> I am having some trouble with the dsindex tool. I used the fasta_tool to split my original multifasta file and ran maker with the ?base and ?g flags. I then used the dsindex tool to summarize results from each fasta. The tool finished without an error message and pointed me to where the files should be but when I went to that directory there was no datastore and the index.log said that it had started on each of the fastas but not finished. I got around this problem using gff3_merge by using the ?o option and providing paths to the gff files but this is not working with the fasta_merge tool. I don?t want to just cat the files together because I want to be sure the merged gff and protein.fasta files are the same for downstream annotation steps. I?ve included examples of the commands I used below and the output from dsindex. Note that the individual fastas finished without errors and produced datastores. I would really appreciate any input you might have with this problem and THANK YOU for developing such a user friendly pipeline. /maker/bin/fasta_tool --split placed.fasta mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides maker/bin/maker -dsindex -fix_nucleotides STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /placed.maker.output/placed_datastore ##this directory was not generated To access files for individual sequences use the datastore index: /placed.maker.output/placed_master_datastore_index.log /maker/bin/gff3_merge -o placed.gff * /maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta 2.maker.proteins.fasta ##this did not work From carson.holt at genetics.utah.edu Wed Apr 23 08:51:59 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 23 Apr 2014 14:51:59 +0000 Subject: [maker-devel] problem with dsindex In-Reply-To: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> References: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> Message-ID: I don't think all your contigs are finished or you did not supply the -base tag when running -dsindex. If it says STARTED rather than FINISHED, then the output files for that contig are missing from the directory it is looking at. For example this is how you should be running everything --> /maker/bin/fasta_tool --split placed.fasta mpiexec -n 4 /maker/bin/maker -base placed -g 1.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 2.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 3.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 4.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 5.fasta -fix_nucleotides Now all will write to placed.maker.output Then you need to do this--> maker/bin/maker -dsindex -base placed -g placed.fasta Then it will rebuild the index for placed.maker.output/placed_master_datastore_index.log Thanks, Carson On 4/22/14, 10:48 PM, "kdelmore at zoology.ubc.ca" wrote: >I am having some trouble with the dsindex tool. I used the fasta_tool to >split my original multifasta file and ran maker with the ?base and ?g >flags. I then used the dsindex tool to summarize results from each fasta. >The tool finished without an error message and pointed me to where the >files should be but when I went to that directory there was no datastore >and the index.log said that it had started on each of the fastas but not >finished. I got around this problem using gff3_merge by using the ?o >option and providing paths to the gff files but this is not working with >the fasta_merge tool. I don?t want to just cat the files together because >I want to be sure the merged gff and protein.fasta files are the same for >downstream annotation steps. I?ve included examples of the commands I used >below and the output from dsindex. Note that the individual fastas >finished without errors and produced datastores. > >I would really appreciate any input you might have with this problem and >THANK YOU for developing such a user friendly pipeline. > >/maker/bin/fasta_tool --split placed.fasta > >mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides > >maker/bin/maker -dsindex -fix_nucleotides >STATUS: Parsing control files... >STATUS: Processing and indexing input FASTA files... >STATUS: Setting up database for any GFF3 input... >A data structure will be created for you at: >/placed.maker.output/placed_datastore ##this directory was not generated >To access files for individual sequences use the datastore index: >/placed.maker.output/placed_master_datastore_index.log > >/maker/bin/gff3_merge -o placed.gff * > >/maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta >2.maker.proteins.fasta ##this did not work > > > From carsonhh at gmail.com Wed Apr 23 08:57:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 23 Apr 2014 08:57:34 -0600 Subject: [maker-devel] problem with dsindex In-Reply-To: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> References: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> Message-ID: Also fasta_merge works differently than gff3_merge. It requires the datastore index because it is trying to find directories and the 'type' and 'group' the fasta files in those directories. Without the datastore index, it is the equivalent of 'cat file1.fa file2.fa > file3.fa'. It also requires the '-i' flag when specifying individual fasta files. --Carson On 4/22/14, 10:48 PM, "kdelmore at zoology.ubc.ca" wrote: >I am having some trouble with the dsindex tool. I used the fasta_tool to >split my original multifasta file and ran maker with the ?base and ?g >flags. I then used the dsindex tool to summarize results from each fasta. >The tool finished without an error message and pointed me to where the >files should be but when I went to that directory there was no datastore >and the index.log said that it had started on each of the fastas but not >finished. I got around this problem using gff3_merge by using the ?o >option and providing paths to the gff files but this is not working with >the fasta_merge tool. I don?t want to just cat the files together because >I want to be sure the merged gff and protein.fasta files are the same for >downstream annotation steps. I?ve included examples of the commands I used >below and the output from dsindex. Note that the individual fastas >finished without errors and produced datastores. > >I would really appreciate any input you might have with this problem and >THANK YOU for developing such a user friendly pipeline. > >/maker/bin/fasta_tool --split placed.fasta > >mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides > >maker/bin/maker -dsindex -fix_nucleotides >STATUS: Parsing control files... >STATUS: Processing and indexing input FASTA files... >STATUS: Setting up database for any GFF3 input... >A data structure will be created for you at: >/placed.maker.output/placed_datastore ##this directory was not generated >To access files for individual sequences use the datastore index: >/placed.maker.output/placed_master_datastore_index.log > >/maker/bin/gff3_merge -o placed.gff * > >/maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta >2.maker.proteins.fasta ##this did not work > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From a.priyam at qmul.ac.uk Thu Apr 24 01:28:38 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Thu, 24 Apr 2014 12:58:38 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: You say est_gff is the equivalent of est= (except that alignment structure is a part of gff). What would MAKER do if I set both est= and est_gff= options in maker_opts.ctl? Will it ignore est=? -- Priyam On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: > est_gff is the equivalent of est=, but because the alignment structure is > already in the GFF3, I don't need to align sequence with blastn/exonerate. > model_gff and pred_gff are essentially the same with the difference being > that model_gff can be kept in the final results even without supporting > evidence, but pred_gff won't. Pred_gff needs evidence support because it > is a potential model, where model_gff is considered a known model even if > the structure of that model may be uncertain. > > other_gff is just a convenience method for passing through GFF3 features > to the final result. It's impossible to have MAKER be aware of every kind > of possible entry, so if you have something more exotic in the final > output (sequence variant information, alternate alleles, promotor and > methylation site, etc.) then you can pass it in there and it will just be > printed into the file. It's basically the equivalent of concatenating two > GFF3 files together, but it handles the proper reordering of sequence > information at the end of the GFF3 file (because technically you can't > just concatenate GFF3 files end-to-end). You can also use the gff3_merge > tool that comes with MAKER to get the same effect. > > --Carson > > > > On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: > >>Thanks, Carson. >> >>I now understand that I shouldn't use est_reds options. >> >>Does MAKER utilise est_gff for prediction or simply passes the >>annotations through to the output GFF? In that case how is it >>different from using other_gff / model_gff (what's the difference >>between these two?) >> >>I have both assembled and raw reads. Is it sufficient to just use the >>assembled set? >> >>-- Priyam >> >>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: >>> The est_reads option doesn't do anything. It in the run log for >>>backwards >>> compatibility with old jobs because MAKER has a restart capability (i.e. >>> people can rerun new MAKER versions against old MAKER output in the same >>> directory - it can reuse old raw results to avoid rerunning analysis >>> steps). The est_reads was originally there for developer >>>experimentation, >>> but then it went away. >>> >>> You need to use an external tool like tophat and cufflinks to align >>>short >>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>passthrough >>> option you mentioned). Or you can assemble then without alignment using >>> something like trinity (then you can provide that result to the est= >>> options because it will be in fasta format). >>> >>> You should not use raw reads directly with MAKER, you need to preprocess >>> them using one of the methods mentioned for them to be useful. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>> >>>>Hi, >>>> >>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>raw reads I came across this thread >>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>and runs just fine. >>>> >>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>impact the predicted set negatively? >>>> >>>>-- Priyam >>>> >>>>_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> > > From carsonhh at gmail.com Thu Apr 24 08:15:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 24 Apr 2014 08:15:07 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: It will use both. you can also provide multiple files to either using comma separated lists. --Carson On 4/24/14, 1:28 AM, "Anurag Priyam" wrote: >You say est_gff is the equivalent of est= (except that alignment >structure is a part of gff). What would MAKER do if I set both est= >and est_gff= options in maker_opts.ctl? Will it ignore est=? > >-- Priyam > >On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: >> est_gff is the equivalent of est=, but because the alignment structure >>is >> already in the GFF3, I don't need to align sequence with >>blastn/exonerate. >> model_gff and pred_gff are essentially the same with the difference >>being >> that model_gff can be kept in the final results even without supporting >> evidence, but pred_gff won't. Pred_gff needs evidence support because >>it >> is a potential model, where model_gff is considered a known model even >>if >> the structure of that model may be uncertain. >> >> other_gff is just a convenience method for passing through GFF3 features >> to the final result. It's impossible to have MAKER be aware of every >>kind >> of possible entry, so if you have something more exotic in the final >> output (sequence variant information, alternate alleles, promotor and >> methylation site, etc.) then you can pass it in there and it will just >>be >> printed into the file. It's basically the equivalent of concatenating >>two >> GFF3 files together, but it handles the proper reordering of sequence >> information at the end of the GFF3 file (because technically you can't >> just concatenate GFF3 files end-to-end). You can also use the >>gff3_merge >> tool that comes with MAKER to get the same effect. >> >> --Carson >> >> >> >> On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >> >>>Thanks, Carson. >>> >>>I now understand that I shouldn't use est_reds options. >>> >>>Does MAKER utilise est_gff for prediction or simply passes the >>>annotations through to the output GFF? In that case how is it >>>different from using other_gff / model_gff (what's the difference >>>between these two?) >>> >>>I have both assembled and raw reads. Is it sufficient to just use the >>>assembled set? >>> >>>-- Priyam >>> >>>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt >>>wrote: >>>> The est_reads option doesn't do anything. It in the run log for >>>>backwards >>>> compatibility with old jobs because MAKER has a restart capability >>>>(i.e. >>>> people can rerun new MAKER versions against old MAKER output in the >>>>same >>>> directory - it can reuse old raw results to avoid rerunning analysis >>>> steps). The est_reads was originally there for developer >>>>experimentation, >>>> but then it went away. >>>> >>>> You need to use an external tool like tophat and cufflinks to align >>>>short >>>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>>passthrough >>>> option you mentioned). Or you can assemble then without alignment >>>>using >>>> something like trinity (then you can provide that result to the est= >>>> options because it will be in fasta format). >>>> >>>> You should not use raw reads directly with MAKER, you need to >>>>preprocess >>>> them using one of the methods mentioned for them to be useful. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>>> >>>>>Hi, >>>>> >>>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>>raw reads I came across this thread >>>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>>and runs just fine. >>>>> >>>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>>impact the predicted set negatively? >>>>> >>>>>-- Priyam >>>>> >>>>>_______________________________________________ >>>>>maker-devel mailing list >>>>>maker-devel at box290.bluehost.com >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >> >> From anurag08priyam at gmail.com Thu Apr 24 08:26:24 2014 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Thu, 24 Apr 2014 19:56:24 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: That answers all my questions. Thanks, Carson. -- Priyam On Thu, Apr 24, 2014 at 7:45 PM, Carson Holt wrote: > It will use both. you can also provide multiple files to either using > comma separated lists. > > --Carson > > > On 4/24/14, 1:28 AM, "Anurag Priyam" wrote: > >>You say est_gff is the equivalent of est= (except that alignment >>structure is a part of gff). What would MAKER do if I set both est= >>and est_gff= options in maker_opts.ctl? Will it ignore est=? >> >>-- Priyam >> >>On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: >>> est_gff is the equivalent of est=, but because the alignment structure >>>is >>> already in the GFF3, I don't need to align sequence with >>>blastn/exonerate. >>> model_gff and pred_gff are essentially the same with the difference >>>being >>> that model_gff can be kept in the final results even without supporting >>> evidence, but pred_gff won't. Pred_gff needs evidence support because >>>it >>> is a potential model, where model_gff is considered a known model even >>>if >>> the structure of that model may be uncertain. >>> >>> other_gff is just a convenience method for passing through GFF3 features >>> to the final result. It's impossible to have MAKER be aware of every >>>kind >>> of possible entry, so if you have something more exotic in the final >>> output (sequence variant information, alternate alleles, promotor and >>> methylation site, etc.) then you can pass it in there and it will just >>>be >>> printed into the file. It's basically the equivalent of concatenating >>>two >>> GFF3 files together, but it handles the proper reordering of sequence >>> information at the end of the GFF3 file (because technically you can't >>> just concatenate GFF3 files end-to-end). You can also use the >>>gff3_merge >>> tool that comes with MAKER to get the same effect. >>> >>> --Carson >>> >>> >>> >>> On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >>> >>>>Thanks, Carson. >>>> >>>>I now understand that I shouldn't use est_reds options. >>>> >>>>Does MAKER utilise est_gff for prediction or simply passes the >>>>annotations through to the output GFF? In that case how is it >>>>different from using other_gff / model_gff (what's the difference >>>>between these two?) >>>> >>>>I have both assembled and raw reads. Is it sufficient to just use the >>>>assembled set? >>>> >>>>-- Priyam >>>> >>>>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt >>>>wrote: >>>>> The est_reads option doesn't do anything. It in the run log for >>>>>backwards >>>>> compatibility with old jobs because MAKER has a restart capability >>>>>(i.e. >>>>> people can rerun new MAKER versions against old MAKER output in the >>>>>same >>>>> directory - it can reuse old raw results to avoid rerunning analysis >>>>> steps). The est_reads was originally there for developer >>>>>experimentation, >>>>> but then it went away. >>>>> >>>>> You need to use an external tool like tophat and cufflinks to align >>>>>short >>>>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>>>passthrough >>>>> option you mentioned). Or you can assemble then without alignment >>>>>using >>>>> something like trinity (then you can provide that result to the est= >>>>> options because it will be in fasta format). >>>>> >>>>> You should not use raw reads directly with MAKER, you need to >>>>>preprocess >>>>> them using one of the methods mentioned for them to be useful. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>>>> >>>>>>Hi, >>>>>> >>>>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>>>raw reads I came across this thread >>>>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>>>and runs just fine. >>>>>> >>>>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>>>impact the predicted set negatively? >>>>>> >>>>>>-- Priyam >>>>>> >>>>>>_______________________________________________ >>>>>>maker-devel mailing list >>>>>>maker-devel at box290.bluehost.com >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>>g >>>>> >>>>> >>> >>> > > From matthew.macmanes at unh.edu Sat Apr 26 08:56:25 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Sat, 26 Apr 2014 10:56:25 -0400 Subject: [maker-devel] Use of each() on hash Message-ID: Hello, I am getting a large number of errors, while running maker on my ubuntu server. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI Compiled perl modules using ./build Thanks for any insight anyone may have. __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 09:26:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 09:26:24 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: Message-ID: The message appears to be coming from forks.pm. Probably a warning added to perl 5.18.2 which is really really new (other versions don't care about this), and most developers would not consider 5.18 a fully stable release for production purposes (it will have lots of test features and messages that will get improved or dropped rather quickly). You can try updating the forks module from CPAN. Otherwise I would ignore it, as forks is sufficiently tested to know it works (it's not a MAKER module, it a widely used CPAN module - literally tens of thousands of scripts use it worldwide). The authors of forks.pm will take steps to silence the warning rather quickly, or the warning will be removed from the perl interpreter altogether. Thanks, Carson Sent from my iPhone > On Apr 26, 2014, at 8:56 AM, Matthew MacManes wrote: > > Hello, > > I am getting a large number of errors, while running maker on my ubuntu server. > > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > > It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. > > For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI > > Compiled perl modules using ./build > > Thanks for any insight anyone may have. > > __________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Sat Apr 26 21:34:16 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 27 Apr 2014 03:34:16 +0000 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: Message-ID: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt > wrote: The message appears to be coming from forks.pm. Probably a warning added to perl 5.18.2 which is really really new (other versions don't care about this), and most developers would not consider 5.18 a fully stable release for production purposes (it will have lots of test features and messages that will get improved or dropped rather quickly). You can try updating the forks module from CPAN. Otherwise I would ignore it, as forks is sufficiently tested to know it works (it's not a MAKER module, it a widely used CPAN module - literally tens of thousands of scripts use it worldwide). The authors of forks.pm will take steps to silence the warning rather quickly, or the warning will be removed from the perl interpreter altogether. Thanks, Carson Sent from my iPhone On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: Hello, I am getting a large number of errors, while running maker on my ubuntu server. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI Compiled perl modules using ./build Thanks for any insight anyone may have. __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 22:06:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 22:06:46 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> References: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> Message-ID: Yah, I had already seen that ticket. It's related to changing the function from a while loop to a foreach loop just to suppress the warning. Not sure why the forks.pm maintainer hasn't looked at it, but I imagine he will probably just do something more like --> no warnings qw(each); or whatever would suppress that warning without altering anything else in the code. I wouldn't say 5.18 is a development release. What said is that it's not good for 'production'. The problem is that most system still use 5.10 and 5.12, with a very few only recently moving to 5.16 (amazon's EC2 images for example). So you will find that issues with even very popular CPAN modules (as we see here) will be more common in something like 5.18.X. Not because 5.18 is flawed, or buggy, but because it's not yet used enough to flush out all the secondary issues it can cause elsewhere in wider world of perl. Thanks, Carson From: "Fields, Christopher J" Date: Saturday, April 26, 2014 at 9:34 PM To: Carson Holt Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt wrote: > The message appears to be coming from forks.pm. Probably a warning added to > perl 5.18.2 which is really really new (other versions don't care about this), > and most developers would not consider 5.18 a fully stable release for > production purposes (it will have lots of test features and messages that will > get improved or dropped rather quickly). You can try updating the forks > module from CPAN. Otherwise I would ignore it, as forks is sufficiently > tested to know it works (it's not a MAKER module, it a widely used CPAN module > - literally tens of thousands of scripts use it worldwide). The authors of > forks.pm will take steps to silence the warning rather quickly, or the warning > will be removed from the perl interpreter altogether. > > Thanks, > Carson > > Sent from my iPhone > > On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: > >> Hello, >> >> I am getting a large number of errors, while running maker on my ubuntu >> server. >> >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x2045200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x837200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x9d1200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> >> It is unclear how this effects the results or performance of the software, >> but these errors are repeated thousands of times in even a small run. >> >> For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI >> >> Compiled perl modules using ./build >> >> Thanks for any insight anyone may have. >> >> __________________________________ >> Matthew MacManes, Ph.D. >> University of New Hampshire I Assistant Professor >> Department of Molecular, Cellular, & Biomedical Sciences >> Durham, NH 03824 >> Phone: 603-862-4052 I Twitter: @PeroMHC >> Web: genomebio.org >> Office: 189 Rudman Hall I Lab: 145 Rudman Hall >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 22:51:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 22:51:30 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> Message-ID: If you don't want to wait for the fork.pm maintainer to alter his code and submit an update to CPAN, you should be able to suppress the warning by manually editing forks.pm line 1736 yourself. Change it from this --> $write = each %WRITE; To this (make sure to include the {} brackets)--> { no warnings qw(internal); $write = each %WRITE; } The issue is because the modules author has his code calling 'each', altering the hash, and then calling 'each' again which causes a warning in perl 5.18+. In this case it's relatively innocuous because of how the value and 'each' function are being used (any hash reordering ends up being handled in an outer while loop). Thanks, Carson From: Carson Holt Date: Saturday, April 26, 2014 at 10:06 PM To: "Fields, Christopher J" Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash Yah, I had already seen that ticket. It's related to changing the function from a while loop to a foreach loop just to suppress the warning. Not sure why the forks.pm maintainer hasn't looked at it, but I imagine he will probably just do something more like --> no warnings qw(each); or whatever would suppress that warning without altering anything else in the code. I wouldn't say 5.18 is a development release. What said is that it's not good for 'production'. The problem is that most system still use 5.10 and 5.12, with a very few only recently moving to 5.16 (amazon's EC2 images for example). So you will find that issues with even very popular CPAN modules (as we see here) will be more common in something like 5.18.X. Not because 5.18 is flawed, or buggy, but because it's not yet used enough to flush out all the secondary issues it can cause elsewhere in wider world of perl. Thanks, Carson From: "Fields, Christopher J" Date: Saturday, April 26, 2014 at 9:34 PM To: Carson Holt Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt wrote: > The message appears to be coming from forks.pm. Probably a warning added to > perl 5.18.2 which is really really new (other versions don't care about this), > and most developers would not consider 5.18 a fully stable release for > production purposes (it will have lots of test features and messages that will > get improved or dropped rather quickly). You can try updating the forks > module from CPAN. Otherwise I would ignore it, as forks is sufficiently > tested to know it works (it's not a MAKER module, it a widely used CPAN module > - literally tens of thousands of scripts use it worldwide). The authors of > forks.pm will take steps to silence the warning rather quickly, or the warning > will be removed from the perl interpreter altogether. > > Thanks, > Carson > > Sent from my iPhone > > On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: > >> Hello, >> >> I am getting a large number of errors, while running maker on my ubuntu >> server. >> >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x2045200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x837200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x9d1200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> >> It is unclear how this effects the results or performance of the software, >> but these errors are repeated thousands of times in even a small run. >> >> For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI >> >> Compiled perl modules using ./build >> >> Thanks for any insight anyone may have. >> >> __________________________________ >> Matthew MacManes, Ph.D. >> University of New Hampshire I Assistant Professor >> Department of Molecular, Cellular, & Biomedical Sciences >> Durham, NH 03824 >> Phone: 603-862-4052 I Twitter: @PeroMHC >> Web: genomebio.org >> Office: 189 Rudman Hall I Lab: 145 Rudman Hall >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From muriel.grosb at gmail.com Mon Apr 28 02:35:25 2014 From: muriel.grosb at gmail.com (Muriel Gros-Balthazard) Date: Mon, 28 Apr 2014 10:35:25 +0200 Subject: [maker-devel] Repeat Library Construction : Exclusion of gene fragments Message-ID: <535E12CD.9020302@gmail.com> Hello ! I ran RepeatModeler and seperates the output into ModelerID.lib and Modelerunknown.lib as it is explained in the protocole. In total, I have about 600 sequences in these two files. I now want to exclude gene fragments. I downloaded in UniProtDB all the plant protein sequences and plan to use blastx. However, I don't know which parameter I should use for blastx, especially, the -e value ? Thanks a lot for your help, Muriel GB From mhinsley at ebi.ac.uk Tue Apr 29 02:21:06 2014 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Tue, 29 Apr 2014 09:21:06 +0100 Subject: [maker-devel] unexpected alternate splicing Message-ID: <535F60F2.5050902@ebi.ac.uk> Hi We've just reinstalled maker 2.31 using mpich3 (3.1) and are delighted that file locking and other issues have been resolved. (I'm running maker across several nodes on the compute farm). The maker code is identical: I took the previous tar.gz archive and made a clean build. Using a copy of a previous configuration to test, the only differences I can see is that the location of some files has changed (the working directory is on a different file system) and that I'm using a bigger (unfiltered) repeat library. The previous maker run produced 17393 genes and 17393 mRNAs, and this new version gives 15927 genes and 21328 mRNA. I have alt_splice=0: $ grep splice ../maker_opts.ctl alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no Any idea why I'm getting multiple mRNAs per gene? -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom From carsonhh at gmail.com Tue Apr 29 06:59:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Apr 2014 06:59:04 -0600 Subject: [maker-devel] unexpected alternate splicing In-Reply-To: <535F60F2.5050902@ebi.ac.uk> References: <535F60F2.5050902@ebi.ac.uk> Message-ID: <1653CD3E-CEB7-437E-88CC-0F65C9BDA931@gmail.com> Are you using gff3 files as input? If so, could you send those to me? They are probably coming from thise. --carson Sent from my iPhone > On Apr 29, 2014, at 2:21 AM, Malcolm Hinsley wrote: > > Hi > > We've just reinstalled maker 2.31 using mpich3 (3.1) and are delighted that file locking and other issues have been resolved. (I'm running maker across several nodes on the compute farm). The maker code is identical: I took the previous tar.gz archive and made a clean build. > > Using a copy of a previous configuration to test, the only differences I can see is that the location of some files has changed (the working directory is on a different file system) and that I'm using a bigger (unfiltered) repeat library. > > The previous maker run produced 17393 genes and 17393 mRNAs, and this new version gives 15927 genes and 21328 mRNA. > > I have alt_splice=0: > > $ grep splice ../maker_opts.ctl > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > > > Any idea why I'm getting multiple mRNAs per gene? > > -- > malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 > European Bioinformatics Institute (EMBL-EBI) > European Molecular Biology Laboratory > Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD > United Kingdom > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Wed Apr 30 08:53:29 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 30 Apr 2014 14:53:29 +0000 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: <1398869131512.52399@uga.edu> References: <1398869131512.52399@uga.edu> Message-ID: From: Sivaranjani Namasivayam > Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" > Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 30 08:55:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Apr 2014 08:55:12 -0600 Subject: [maker-devel] FW: protein2genome gene models Message-ID: Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed Apr 30 17:25:17 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 30 Apr 2014 16:25:17 -0700 Subject: [maker-devel] est_forward and conflicting names Message-ID: Hi, Carson. I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m using with est and protein to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I?m using est_forward to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like orf1234 or other numeric IDs. When multiple evidence sequences map to the same location, how does est_forward choose which name to use? If it?s chosen arbitrarily, could it be possible to choose the most common name instead? Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.macmanes at unh.edu Tue Apr 1 05:23:59 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Tue, 1 Apr 2014 07:23:59 -0400 Subject: [maker-devel] Installing Maker on Cray Message-ID: Hello, I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. Any advice? Thanks, Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at icloud.com Tue Apr 1 06:58:35 2014 From: carson.holt at icloud.com (Carson Holt) Date: Tue, 01 Apr 2014 06:58:35 -0600 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: References: Message-ID: Create a soft link called mpicc. I can't guarantee shared libraries are installed on you system though as not all system derived versions of MPICH2 have been configured with shared libraries. --Carson Sent from my iPhone > On Apr 1, 2014, at 5:23 AM, Matthew MacManes wrote: > > Hello, > > I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page > > Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC > > When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. > > Any advice? > > Thanks, Matt > > __________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.macmanes at unh.edu Tue Apr 1 10:11:55 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Tue, 1 Apr 2014 12:11:55 -0400 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> References: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> Message-ID: Hi Carson and list: I tried that - we'll see if it works. I'm hung up on Perl dependencies right now - the Craycc compiler is not happy with several of them (forks, to name one). If anybody has installed Maker on a Cray, please contact me! Thanks, Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Tue, Apr 1, 2014 at 8:58 AM, Carson Holt wrote: > Create a soft link called mpicc. I can't guarantee shared libraries are > installed on you system though as not all system derived versions of MPICH2 > have been configured with shared libraries. > > --Carson > > > > Sent from my iPhone > > On Apr 1, 2014, at 5:23 AM, Matthew MacManes > wrote: > > Hello, > > I am trying to install the MPI version of Maker on our Cray > supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page > > Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's > own proprietary compilers mpicc=cc and mpicxx=CC > > When running the 1st step in src 'perl Build.pl', it asks me for the > location of mpicc - I can give the full path to Cray equivalent cc, but it > is not recognized. Many other programs allow me to specify the c compiler, > e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. > > Any advice? > > Thanks, Matt > > __________________________________ > *Matthew MacManes*, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Apr 1 10:29:40 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Apr 2014 16:29:40 +0000 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: References: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> Message-ID: <350474CE-B7EB-4EFF-9C8B-AD71FBB81CA3@illinois.edu> We might be interested in that ourselves at some point: https://bluewaters.ncsa.illinois.edu chris On Apr 1, 2014, at 11:11 AM, Matthew MacManes > wrote: Hi Carson and list: I tried that - we'll see if it works. I'm hung up on Perl dependencies right now - the Craycc compiler is not happy with several of them (forks, to name one). If anybody has installed Maker on a Cray, please contact me! Thanks, Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Tue, Apr 1, 2014 at 8:58 AM, Carson Holt > wrote: Create a soft link called mpicc. I can't guarantee shared libraries are installed on you system though as not all system derived versions of MPICH2 have been configured with shared libraries. --Carson Sent from my iPhone On Apr 1, 2014, at 5:23 AM, Matthew MacManes > wrote: Hello, I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. Any advice? Thanks, Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at bioperl.org Tue Apr 1 12:39:14 2014 From: jason at bioperl.org (Jason Stajich) Date: Tue, 1 Apr 2014 11:39:14 -0700 Subject: [maker-devel] maker to EvidenceModeler In-Reply-To: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> References: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> Message-ID: I've used this script I wrote to make the necessary input files from maker GFF3. https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/maker2evm.pl Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Tue, Mar 25, 2014 at 9:33 AM, dhivya arasappan wrote: > Hi Carson and others, > > Is there an easy tool/pipeline available as part of maker utilities to > convert maker and SNAP output to files acceptable by EvidenceModeler? > > It looks like it also needs just gff files, but with a few tweaks. > EvidenceModeler seems better equipped to handle PASA annotation results > than maker results. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 1 12:36:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 01 Apr 2014 12:36:44 -0600 Subject: [maker-devel] Missing UTRs in GFF In-Reply-To: References: Message-ID: It was indeed caused by the correct_est_fusion=1 option (which is supposed to trim off UTR if it appears overlap of UTR across genes is caused by merged mRNAseq). I have attached a patch that is used to replace .../maker/lib/maker/auto_annotator.pm, and I've updated the website download to include the patch as in MAKER download version 2.31.3. Thanks, Carson From: Benjamin Rubin Date: Tuesday, April 1, 2014 at 9:21 AM To: Carson Holt Subject: Re: [maker-devel] Missing UTRs in GFF OK, I think I uploaded everything. I included a cleaned up version of the control file without all of my paths in case that is useful. Thanks, Ben On Tue, Apr 1, 2014 at 9:50 AM, Carson Holt wrote: > Could upload your input fasta and hmm files as well. Sometimes I can > reproduce errors using just the raw reports, but it looks like I will need the > input files. > > --Carson > > > From: Benjamin Rubin > Date: Tuesday, April 1, 2014 at 8:38 AM > To: Carson Holt > Subject: Re: [maker-devel] Missing UTRs in GFF > > Hi Carson, > > I tried using version 2.31 on a scaffold where this problem occurred with 2.30 > and got the same result, unfortunately. I did use corr_est_fusion=1 both times > so this might be related. I have uploaded the sequence for this scaffold and > the output directory under username "brubin". Is this the data that you meant? > > I am also reattaching information on a representative problem gene from this > scaffold that occurs at base 1330779. > > Thanks so much for the help, > Ben > > > On Mon, Mar 31, 2014 at 9:37 AM, Carson Holt wrote: >> Not something I've seen before, but there was a patch for another issue that >> was cause by the use of avoid_est_fusion=1, that may be related. Try the >> current stable release 2.31, and let me know if it still happens. >> >> You can also upload the contig folder from one of the regions in question >> here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> Then I could verify the bug, and see if it is something that happens in the >> current release. >> >> --Carson >> >> >> From: Benjamin Rubin >> Date: Saturday, March 29, 2014 at 10:24 AM >> To: >> Subject: [maker-devel] Missing UTRs in GFF >> >> I have annotated a eukaryotic genome with MAKER 2.30. I recently realized >> that there are a few genes in the GFF file produced by gff3_merge with >> inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs >> have their own lines in the GFF file. However, for the problematic genes, the >> UTRs are not specified in the GFF file and all exons are annotated as CDS. >> The UTRs do appear in the gene header and the protein sequences are the >> correct length (do not include the UTR). I have attached an example from the >> GFF file. >> >> Is this a known problem, or have I done something wrong? Is there an easy way >> to fix the GFF file? >> >> Thanks for your help, >> Ben >> >> -- >> _____________________________________________________ >> Benjamin ER Rubin >> PhD Candidate >> Committee on Evolutionary Biology >> University of Chicago >> benrubin.org >> >> Division of Insects >> Zoology Department >> Field Museum of Natural History >> 1400 South Lake Shore Drive >> Chicago, IL 60605 >> USA >> Office: (312) 665-7776 >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > benrubin.org > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: auto_annotator.pm Type: text/x-perl-script Size: 101568 bytes Desc: not available URL: From amelia.ireland at gmod.org Thu Apr 3 15:10:53 2014 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Thu, 3 Apr 2014 14:10:53 -0700 Subject: [maker-devel] GMOD Online Training 2014 Message-ID: Greetings GMOD community! Applications are now open for the 2014 GMOD online training course, to be held from May 19th - 23rd 2014. The course will cover the installation, configuration, and usage of core GMOD software, including GBrowse and JBrowse, Galaxy, MAKER, Tripal, WebApollo, Canto, and the Chado database. The course is taught by experienced instructors and developers with deep knowledge of the tools. Although the course will be run online, students will be able to interact with the tutors and fellow attendees, ask questions, and so on. For more information and to apply, please see http://gmod.org/wiki/GMOD_Online_Training_2014 If you have any questions, please contact the GMOD help desk at help at gmod.org. Thanks! -- Amelia Ireland GMOD Community Support Generic Model Organism Database project http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From Brian.Mack at ARS.USDA.GOV Mon Apr 7 06:55:01 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Mon, 7 Apr 2014 12:55:01 +0000 Subject: [maker-devel] maker_functional_gff Message-ID: Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular "-outfmt 6" option against the uniprot uniref-50. I put these results in the maker_functional_gff program using "maker_functional_gff uniref-50 blastp-output maker.gff" but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I'm doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 7 08:58:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 08:58:20 -0600 Subject: [maker-devel] maker_functional_gff Message-ID: maker_functional_gff works with UniProt/Swiss-Prot. The uniref-50 headers are different. The script looks for the OS= GN= and PE= tags. You might be able to coerce it into working on the UniRef header by changing Tax= to OS=, RepID= to GN= and then adding a PE= to the end of the header as just a placeholder. --Carson From: "Mack, Brian" Date: Monday, April 7, 2014 at 6:55 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] maker_functional_gff Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular ?-outfmt 6? option against the uniprot uniref-50. I put these results in the maker_functional_gff program using ?maker_functional_gff uniref-50 blastp-output maker.gff? but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I?m doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 7 09:02:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 09:02:55 -0600 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: I added a line to look for the UniRef header format in the attached scripts. Go ahead and give it a try. --Carson From: Carson Holt Date: Monday, April 7, 2014 at 8:58 AM To: "Mack, Brian" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker_functional_gff maker_functional_gff works with UniProt/Swiss-Prot. The uniref-50 headers are different. The script looks for the OS= GN= and PE= tags. You might be able to coerce it into working on the UniRef header by changing Tax= to OS=, RepID= to GN= and then adding a PE= to the end of the header as just a placeholder. --Carson From: "Mack, Brian" Date: Monday, April 7, 2014 at 6:55 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] maker_functional_gff Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular ?-outfmt 6? option against the uniprot uniref-50. I put these results in the maker_functional_gff program using ?maker_functional_gff uniref-50 blastp-output maker.gff? but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I?m doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_functional_fasta Type: application/octet-stream Size: 3452 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_functional_gff Type: application/octet-stream Size: 4103 bytes Desc: not available URL: From darasappan at gmail.com Mon Apr 7 09:57:08 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 7 Apr 2014 10:57:08 -0500 Subject: [maker-devel] keep_preds parameter Message-ID: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Hello, I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? Thanks dhivya From dence at genetics.utah.edu Mon Apr 7 10:06:15 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 7 Apr 2014 16:06:15 +0000 Subject: [maker-devel] keep_preds parameter In-Reply-To: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: Hi Dhivya, That's a correct understanding of keep_preds, and it is a binary parameter; you either tell MAKER to keep the unsupported predictions or not to keep the unsupported predictions. In the output, you can tell which genes are supported by the _AED attribute in the gff3 file. Genes with and AED equal to zero have no support from the evidence sets (protein and EST and alt_EST). ~Daniel On Apr 7, 2014, at 9:57 AM, dhivya arasappan wrote: > Hello, > > I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? > > Thanks > dhivya > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From darasappan at gmail.com Mon Apr 7 10:31:55 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 7 Apr 2014 11:31:55 -0500 Subject: [maker-devel] keep_preds parameter In-Reply-To: References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: Thank you Daniel. But I thought an AED score of zero indicates complete agreement of annotation to evidence and that 1 would mean no agreement? Dhivya On Apr 7, 2014, at 11:06 AM, Daniel Ence wrote: > Hi Dhivya, > > That's a correct understanding of keep_preds, and it is a binary parameter; you either tell MAKER to keep the unsupported predictions or not to keep the unsupported predictions. In the output, you can tell which genes are supported by the _AED attribute in the gff3 file. Genes with and AED equal to zero have no support from the evidence sets (protein and EST and alt_EST). > > ~Daniel > On Apr 7, 2014, at 9:57 AM, dhivya arasappan > wrote: > >> Hello, >> >> I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? >> >> Thanks >> dhivya >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Mon Apr 7 10:33:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 10:33:59 -0600 Subject: [maker-devel] keep_preds parameter In-Reply-To: References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: True. Daniel had the numbers backwards (I often accidentally do that as well). --Carson On 4/7/14, 10:31 AM, "dhivya arasappan" wrote: >Thank you Daniel. But I thought an AED score of zero indicates complete >agreement of annotation to evidence and that 1 would mean no agreement? > >Dhivya > >On Apr 7, 2014, at 11:06 AM, Daniel Ence wrote: > >> Hi Dhivya, >> >> That's a correct understanding of keep_preds, and it is a binary >>parameter; you either tell MAKER to keep the unsupported predictions or >>not to keep the unsupported predictions. In the output, you can tell >>which genes are supported by the _AED attribute in the gff3 file. Genes >>with and AED equal to zero have no support from the evidence sets >>(protein and EST and alt_EST). >> >> ~Daniel >> On Apr 7, 2014, at 9:57 AM, dhivya arasappan >> wrote: >> >>> Hello, >>> >>> I?m looking for a little more explanation about keep_preds parameter. >>>The documentation says that it is a threshold to add unsupported gene >>>predictions. Along with some other changes, I set keep_preds=1 and saw >>>a huge jump in the number of genes I was getting. Is setting this >>>parameter to 1 equivalent to saying, include all predicted genes in my >>>output, even if they are not supported by my set or protein data? Is >>>there a way to tell from my output which genes are unsupported and >>>which are not? Also, are the only two options for this parameter 0 and >>>1? >>> >>> Thanks >>> dhivya >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nextgen.usfs at gmail.com Mon Apr 7 16:34:32 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Mon, 7 Apr 2014 17:34:32 -0500 Subject: [maker-devel] fasta_merge ARRAY error Message-ID: Hello, I?m getting an error when running fasta_merge as follows: Can't use an undefined value as an ARRAY reference at /home/ngs/maker/bin/fasta_merge line 116, line 1942. The result is that the fasta files are somewhat truncated, that is they do not match the gff3 file created from gff3_merge (which does run without any errors). Seems like it is getting stuck somewhere and then crashes. Is there another way to easily get the CDS out of the maker generated GFF file? Thanks, Jon From dence at genetics.utah.edu Mon Apr 7 19:23:07 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 8 Apr 2014 01:23:07 +0000 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: Hi Jon, Will you please send the command that gave you that error? Also, will you upload the maker control files you used and the gff3 file to the URL below? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 Also, which version of MAKER are you using? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS Ion PGM [nextgen.usfs at gmail.com] Sent: Monday, April 07, 2014 4:34 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] fasta_merge ARRAY error Hello, I?m getting an error when running fasta_merge as follows: Can't use an undefined value as an ARRAY reference at /home/ngs/maker/bin/fasta_merge line 116, line 1942. The result is that the fasta files are somewhat truncated, that is they do not match the gff3 file created from gff3_merge (which does run without any errors). Seems like it is getting stuck somewhere and then crashes. Is there another way to easily get the CDS out of the maker generated GFF file? Thanks, Jon _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Apr 7 20:02:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 20:02:30 -0600 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: What version of MAKER are you using, and did you run with the new trnascan option turned on? Basically the script is finding a fasta file for transcripts but the file for proteins is missing. Turning trnascan on can do this (obviously tRNAs can encode transcripts but don't encode proteins). The version of fasta_merge included in the current MAKER 2.31.3 download should handle this correctly. --Carson On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >Hi Jon, Will you please send the command that gave you that error? Also, >will you upload the maker control files you used and the gff3 file to the >URL below? > >http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 > >Also, which version of MAKER are you using? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS >Ion PGM [nextgen.usfs at gmail.com] >Sent: Monday, April 07, 2014 4:34 PM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] fasta_merge ARRAY error > >Hello, > >I?m getting an error when running fasta_merge as follows: > >Can't use an undefined value as an ARRAY reference at >/home/ngs/maker/bin/fasta_merge line 116, line 1942. > >The result is that the fasta files are somewhat truncated, that is they >do not match the gff3 file created from gff3_merge (which does run >without any errors). Seems like it is getting stuck somewhere and then >crashes. Is there another way to easily get the CDS out of the maker >generated GFF file? > >Thanks, > >Jon > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nextgen.usfs at gmail.com Tue Apr 8 06:56:22 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Tue, 8 Apr 2014 07:56:22 -0500 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Hi Carson and Daniel, I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so perhaps I should just get fasta_merge from 2.31.3 and give it a shot. But first to clarify, fasta_merge -d maker1_master_datastore_index.log - returns the appropriate files, however both the maker.all.proteins.fasta and maker.all.transcripts.fasta return 7401 with a grep command counting ?>?, while the gff3_merge -d maker1_master_datastore_index.log runs without failure and a grep command counting ?gene? returns 7525 models. I uploaded the files requested below. Thanks for the help. -Jon On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: > What version of MAKER are you using, and did you run with the new trnascan > option turned on? Basically the script is finding a fasta file for > transcripts but the file for proteins is missing. Turning trnascan on can > do this (obviously tRNAs can encode transcripts but don't encode > proteins). The version of fasta_merge included in the current MAKER > 2.31.3 download should handle this correctly. > > --Carson > > > > On 4/7/14, 7:23 PM, "Daniel Ence" wrote: > >> Hi Jon, Will you please send the command that gave you that error? Also, >> will you upload the maker control files you used and the gff3 file to the >> URL below? >> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >> >> Also, which version of MAKER are you using? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS >> Ion PGM [nextgen.usfs at gmail.com] >> Sent: Monday, April 07, 2014 4:34 PM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] fasta_merge ARRAY error >> >> Hello, >> >> I?m getting an error when running fasta_merge as follows: >> >> Can't use an undefined value as an ARRAY reference at >> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >> >> The result is that the fasta files are somewhat truncated, that is they >> do not match the gff3 file created from gff3_merge (which does run >> without any errors). Seems like it is getting stuck somewhere and then >> crashes. Is there another way to easily get the CDS out of the maker >> generated GFF file? >> >> Thanks, >> >> Jon >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Tue Apr 8 08:54:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 08 Apr 2014 08:54:05 -0600 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> References: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Message-ID: I've attached the fixed version (I see that the patched one is not in 2.31.3, but I'll get that taken care of). The tRNA genes will be in the maker.trnascan.transcripts.fasta. The other files will have only the coding genes. --Carson On 4/8/14, 6:56 AM, "USFS Ion PGM" wrote: >Hi Carson and Daniel, >I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so >perhaps I should just get fasta_merge from 2.31.3 and give it a shot. >But first to clarify, fasta_merge -d maker1_master_datastore_index.log - >returns the appropriate files, however both the maker.all.proteins.fasta >and maker.all.transcripts.fasta return 7401 with a grep command counting >?>?, while the gff3_merge -d maker1_master_datastore_index.log runs >without failure and a grep command counting ?gene? returns 7525 models. > >I uploaded the files requested below. Thanks for the help. > >-Jon > > >On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: > >> What version of MAKER are you using, and did you run with the new >>trnascan >> option turned on? Basically the script is finding a fasta file for >> transcripts but the file for proteins is missing. Turning trnascan on >>can >> do this (obviously tRNAs can encode transcripts but don't encode >> proteins). The version of fasta_merge included in the current MAKER >> 2.31.3 download should handle this correctly. >> >> --Carson >> >> >> >> On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >> >>> Hi Jon, Will you please send the command that gave you that error? >>>Also, >>> will you upload the maker control files you used and the gff3 file to >>>the >>> URL below? >>> >>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >>> >>> Also, which version of MAKER are you using? >>> >>> Thanks, >>> Daniel >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>USFS >>> Ion PGM [nextgen.usfs at gmail.com] >>> Sent: Monday, April 07, 2014 4:34 PM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] fasta_merge ARRAY error >>> >>> Hello, >>> >>> I?m getting an error when running fasta_merge as follows: >>> >>> Can't use an undefined value as an ARRAY reference at >>> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >>> >>> The result is that the fasta files are somewhat truncated, that is they >>> do not match the gff3 file created from gff3_merge (which does run >>> without any errors). Seems like it is getting stuck somewhere and then >>> crashes. Is there another way to easily get the CDS out of the maker >>> generated GFF file? >>> >>> Thanks, >>> >>> Jon >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: fasta_merge Type: application/octet-stream Size: 2978 bytes Desc: not available URL: From nextgen.usfs at gmail.com Tue Apr 8 10:01:18 2014 From: nextgen.usfs at gmail.com (Jon Palmer) Date: Tue, 08 Apr 2014 11:01:18 -0500 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Message-ID: <53441D4E.2070502@gmail.com> Thanks Carson, error is gone and is now working. Thanks for a great tool and for the fantastic support! -Jon On 04/08/2014 09:54 AM, Carson Holt wrote: > I've attached the fixed version (I see that the patched one is not in > 2.31.3, but I'll get that taken care of). > > The tRNA genes will be in the maker.trnascan.transcripts.fasta. The other > files will have only the coding genes. > > --Carson > > > > On 4/8/14, 6:56 AM, "USFS Ion PGM" wrote: > >> Hi Carson and Daniel, >> I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so >> perhaps I should just get fasta_merge from 2.31.3 and give it a shot. >> But first to clarify, fasta_merge -d maker1_master_datastore_index.log - >> returns the appropriate files, however both the maker.all.proteins.fasta >> and maker.all.transcripts.fasta return 7401 with a grep command counting >> ?>?, while the gff3_merge -d maker1_master_datastore_index.log runs >> without failure and a grep command counting ?gene? returns 7525 models. >> >> I uploaded the files requested below. Thanks for the help. >> >> -Jon >> >> >> On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: >> >>> What version of MAKER are you using, and did you run with the new >>> trnascan >>> option turned on? Basically the script is finding a fasta file for >>> transcripts but the file for proteins is missing. Turning trnascan on >>> can >>> do this (obviously tRNAs can encode transcripts but don't encode >>> proteins). The version of fasta_merge included in the current MAKER >>> 2.31.3 download should handle this correctly. >>> >>> --Carson >>> >>> >>> >>> On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >>> >>>> Hi Jon, Will you please send the command that gave you that error? >>>> Also, >>>> will you upload the maker control files you used and the gff3 file to >>>> the >>>> URL below? >>>> >>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >>>> >>>> Also, which version of MAKER are you using? >>>> >>>> Thanks, >>>> Daniel >>>> >>>> >>>> Daniel Ence >>>> Graduate Student >>>> Eccles Institute of Human Genetics >>>> University of Utah >>>> 15 North 2030 East, Room 2100 >>>> Salt Lake City, UT 84112-5330 >>>> ________________________________________ >>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>> USFS >>>> Ion PGM [nextgen.usfs at gmail.com] >>>> Sent: Monday, April 07, 2014 4:34 PM >>>> To: maker-devel at yandell-lab.org >>>> Subject: [maker-devel] fasta_merge ARRAY error >>>> >>>> Hello, >>>> >>>> I?m getting an error when running fasta_merge as follows: >>>> >>>> Can't use an undefined value as an ARRAY reference at >>>> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >>>> >>>> The result is that the fasta files are somewhat truncated, that is they >>>> do not match the gff3 file created from gff3_merge (which does run >>>> without any errors). Seems like it is getting stuck somewhere and then >>>> crashes. Is there another way to easily get the CDS out of the maker >>>> generated GFF file? >>>> >>>> Thanks, >>>> >>>> Jon >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> From sjackman at gmail.com Tue Apr 8 13:21:38 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 8 Apr 2014 12:21:38 -0700 Subject: [maker-devel] Changing rmlib runs RepeatRunner Message-ID: Changing `rmlib` causes not just RepeatMasker to be rerun, but also RepeatRunner. Is the latter necessary? Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 8 14:00:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 08 Apr 2014 14:00:11 -0600 Subject: [maker-devel] Changing rmlib runs RepeatRunner In-Reply-To: References: Message-ID: RepeatRunner runs on what was not masked by RepeatMasker, so changing rmlib can cause RepeatRunner to give slightly different results because RepeatMasker results changed. --Carson From: Shaun Jackman Date: Tuesday, April 8, 2014 at 1:21 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Changing rmlib runs RepeatRunner Changing `rmlib` causes not just RepeatMasker to be rerun, but also RepeatRunner. Is the latter necessary? Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 12:34:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 11:34:34 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER Message-ID: The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA and CDS features, but it does not include gene or exon features, and so it does not appear to be working with MAKER. Has anyone else used GlimmerHMM with MAKER, and how did you deal with this issue? Cheers, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 12:53:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 12:53:55 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match for mRNA and match_part for CDS. Then it will be interpreted as a two level alignments feature which can be given to pred_gff. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 12:34 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Using GlimmerHMM with MAKER The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA and CDS features, but it does not include gene or exon features, and so it does not appear to be working with MAKER. Has anyone else used GlimmerHMM with MAKER, and how did you deal with this issue? Cheers, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 15:32:55 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 14:32:55 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: Thanks, Carson. That helps. I'm trying to do a completely ab initio gene annotation without any est or protein homology evidence, at least for now. The GFF file produce by maker is empty. How do I carry the GlimmerHMM pred_gff (or model_gff) annotations through to the end? Ultimately, I'd like to merge annotations from multiple ab initio predictions. Cheers, Shaun On 10 April 2014 11:53, Carson Holt wrote: > Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match > for mRNA and match_part for CDS. Then it will be interpreted as a two > level alignments feature which can be given to pred_gff. > > --Carson > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 12:34 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Using GlimmerHMM with MAKER > > The GlimmerHMM gene prediction software outputs a GFF file that includes > mRNA and CDS features, but it does not include gene or exon features, and > so it does not appear to be working with MAKER. Has anyone else used > GlimmerHMM with MAKER, and how did you deal with this issue? > > Cheers, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 15:35:17 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 15:35:17 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: keep_preds=1 will force MAKER to keep ab initio results even if their is no evidence supporting them. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 3:32 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Using GlimmerHMM with MAKER Thanks, Carson. That helps. I'm trying to do a completely ab initio gene annotation without any est or protein homology evidence, at least for now. The GFF file produce by maker is empty. How do I carry the GlimmerHMM pred_gff (or model_gff) annotations through to the end? Ultimately, I'd like to merge annotations from multiple ab initio predictions. Cheers, Shaun On 10 April 2014 11:53, Carson Holt wrote: > Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match for > mRNA and match_part for CDS. Then it will be interpreted as a two level > alignments feature which can be given to pred_gff. > > --Carson > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 12:34 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Using GlimmerHMM with MAKER > > The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA > and CDS features, but it does not include gene or exon features, and so it > does not appear to be working with MAKER. Has anyone else used GlimmerHMM with > MAKER, and how did you deal with this issue? > > Cheers, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 16:51:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 15:51:34 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: That worked! Thanks again, Carson. A note for the record: I found that keep_preds=1 carries forward pred_gffannotations, but not model_gff annotations when that GFF file uses match and match_partannotations (like a munged GlimmerHMM GFF file), which makes sense I guess now that I think about it. Cheers, Shaun On 10 April 2014 14:35, Carson Holt wrote: > keep_preds=1 will force MAKER to keep ab initio results even if their is > no evidence supporting them. > > --Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 3:32 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Using GlimmerHMM with MAKER > > Thanks, Carson. That helps. I'm trying to do a completely ab initio gene > annotation without any est or protein homology evidence, at least for now. > The GFF file produce by maker is empty. How do I carry the GlimmerHMM > pred_gff (or model_gff) annotations through to the end? Ultimately, I'd > like to merge annotations from multiple ab initio predictions. > > Cheers, > Shaun > > > On 10 April 2014 11:53, Carson Holt wrote: > >> Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute >> match for mRNA and match_part for CDS. Then it will be interpreted as a >> two level alignments feature which can be given to pred_gff. >> >> --Carson >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Thursday, April 10, 2014 at 12:34 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] Using GlimmerHMM with MAKER >> >> The GlimmerHMM gene prediction software outputs a GFF file that includes >> mRNA and CDS features, but it does not include gene or exon features, and >> so it does not appear to be working with MAKER. Has anyone else used >> GlimmerHMM with MAKER, and how did you deal with this issue? >> >> Cheers, >> Shaun >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 16:55:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 16:55:07 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: The model_gff option can only take gene/mRNA/exon/CDS features, and will ignore match/match_part features. It's a little more restrictive. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 4:51 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Using GlimmerHMM with MAKER model_gff -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbharris at uw.edu Mon Apr 14 19:45:13 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Mon, 14 Apr 2014 18:45:13 -0700 Subject: [maker-devel] empty genome.ann/genome.dna Message-ID: Hi, I recently set up MAKER on a new computer and am having trouble running a dataset that was run successfully on a different computer. After MAKER is finished, I ran gff3_merge and maker2zff and it returns empty genome.ann and genome.dna files. I have tried installing older versions of dependencies and have tinkered with the control files but I still can't figure out what the issue is. The only difference I can find is that the .all.gff file from a successfully run file has lines at the beginning of the file reporting the success of exonerate. On the failing version of maker - these are not reported - it just goes strait to fasta output. However, exonerate appears to work successfully when run outside of the maker pipeline. Any help would be greatly appreciated. Thanks! Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 15 09:33:45 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Apr 2014 09:33:45 -0600 Subject: [maker-devel] empty genome.ann/genome.dna In-Reply-To: References: Message-ID: Could you upload your control files and job input files here--> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi I'll take a look to see if there is any problem with your job's setup. Also what version of MAKER are you running? --Carson From: Rebecca Harris Date: Monday, April 14, 2014 at 7:45 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] empty genome.ann/genome.dna Hi, I recently set up MAKER on a new computer and am having trouble running a dataset that was run successfully on a different computer. After MAKER is finished, I ran gff3_merge and maker2zff and it returns empty genome.ann and genome.dna files. I have tried installing older versions of dependencies and have tinkered with the control files but I still can't figure out what the issue is. The only difference I can find is that the .all.gff file from a successfully run file has lines at the beginning of the file reporting the success of exonerate. On the failing version of maker - these are not reported - it just goes strait to fasta output. However, exonerate appears to work successfully when run outside of the maker pipeline. Any help would be greatly appreciated. Thanks! Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Tue Apr 15 11:01:37 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Tue, 15 Apr 2014 13:01:37 -0400 Subject: [maker-devel] passing names from a gff to new predictions Message-ID: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Hello I have an interesting issue with an existing Maker gff. I have a gff file with human friendly names that I would like to pass to the new predictions. However, some of those genes in the human friendly gff file are incorrect or have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 the names move but so do the incorrect models. Maker simply duplicates these predictions to the new outputs. If I remove the GFF file from the ctl file I get new predictions, that have the necessary corrections but they now have unfriendly names. Do you have any suggestions on how to associate the old names with the new predictions? I could simple blast the old proteins vs the new ones and associate them in that manor but I was wondering if there were any other options within Maker. Since I have the GFF files I also have the associated transcripts and proteins. Do I need to do some iteration of est2/genome then generate a new model gff file? The issue we are dealing with is thousands of short introns in our gff file. These are less than 20 bp and are not biologically feasible so we are trying to correct the gene model predictions. Cheers Ian From carsonhh at gmail.com Tue Apr 15 11:31:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Apr 2014 11:31:35 -0600 Subject: [maker-devel] passing names from a gff to new predictions In-Reply-To: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> References: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Message-ID: If you give anything to pred_gff or model_gff then it is allowed to compete as a predictor and thus can end up in the final results. You stated that the models you are passing in have errors, and you don't want them to be allowed to compete and end up in your final models? Correct. MAKER is not made to expect erroneous input, so I don't have an easy solution for you (I do have a less easy solution though; but you will need to do some editing of the MAKER code). 1. Open .../maker/lib/maker/auto_annotator.pm in an editor like emacs or vi. 2. Search for the 'best_annotations' subroutine (around line 1248 depending on which version of MAKER you have). 3. Then edit it as follows: This is how the top section of the subroutine should look at first --> sub best_annotations { my $annotations = shift; my $CTL_OPT = shift; my @predictors = @{$CTL_OPT->{_predictor}}; ... Change it to this --> sub best_annotations { my $annotations = shift; my $CTL_OPT = shift; my @predictors = grep {!/model_gff/} @{$CTL_OPT->{_predictor}}; ... Now run maker again with your old GFF3 file as input to model_gff, and just remember to change the MAKER code back to the way it was when your done with everything. Basically the change will hard filter model_gff results from being allowed into your final annotations. So names will still move from model_gff to your final results with the map_forward=1 option but none of the old models will make it as gene/mRNA/exon/CDS features in the final GFF3 (they will still be listed as match/match_part reference features though). Thanks, Carson On 4/15/14, 11:01 AM, "UMD Bioinformatics" wrote: > Hello > > I have an interesting issue with an existing Maker gff. I have a gff file with > human friendly names that I would like to pass to the new predictions. > However, some of those genes in the human friendly gff file are incorrect or > have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 > the names move but so do the incorrect models. Maker simply duplicates these > predictions to the new outputs. If I remove the GFF file from the ctl file I > get new predictions, that have the necessary corrections but they now have > unfriendly names. Do you have any suggestions on how to associate the old > names with the new predictions? I could simple blast the old proteins vs the > new ones and associate them in that manor but I was wondering if there were > any other options within Maker. > > Since I have the GFF files I also have the associated transcripts and > proteins. > Do I need to do some iteration of est2/genome then generate a new model gff > file? > > The issue we are dealing with is thousands of short introns in our gff file. > These are less than 20 bp and are not biologically feasible so we are trying > to correct the gene model predictions. > > Cheers > Ian > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Tue Apr 15 11:54:00 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Tue, 15 Apr 2014 13:54:00 -0400 Subject: [maker-devel] passing names from a gff to new predictions In-Reply-To: References: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Message-ID: <31BC21FD-D9D6-4B66-B0D7-C48FBC3B7A98@gmail.com> Carson, That seems to fix this issue. Thanks for the insight not something I would have ever come up with. Cheers Ian On Apr 15, 2014, at 1:31 PM, Carson Holt wrote: > If you give anything to pred_gff or model_gff then it is allowed to compete as a predictor and thus can end up in the final results. You stated that the models you are passing in have errors, and you don't want them to be allowed to compete and end up in your final models? Correct. > > MAKER is not made to expect erroneous input, so I don't have an easy solution for you (I do have a less easy solution though; but you will need to do some editing of the MAKER code). > > Open .../maker/lib/maker/auto_annotator.pm in an editor like emacs or vi. > Search for the 'best_annotations' subroutine (around line 1248 depending on which version of MAKER you have). > Then edit it as follows: > > This is how the top section of the subroutine should look at first --> > > sub best_annotations { > my $annotations = shift; > my $CTL_OPT = shift; > > my @predictors = @{$CTL_OPT->{_predictor}}; > > ... > > Change it to this --> > > sub best_annotations { > my $annotations = shift; > my $CTL_OPT = shift; > > my @predictors = grep {!/model_gff/} @{$CTL_OPT->{_predictor}}; > > ... > > > > Now run maker again with your old GFF3 file as input to model_gff, and just remember to change the MAKER code back to the way it was when your done with everything. Basically the change will hard filter model_gff results from being allowed into your final annotations. So names will still move from model_gff to your final results with the map_forward=1 option but none of the old models will make it as gene/mRNA/exon/CDS features in the final GFF3 (they will still be listed as match/match_part reference features though). > > Thanks, > Carson > > > > On 4/15/14, 11:01 AM, "UMD Bioinformatics" wrote: > >> Hello >> >> I have an interesting issue with an existing Maker gff. I have a gff file with human friendly names that I would like to pass to the new predictions. However, some of those genes in the human friendly gff file are incorrect or have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 the names move but so do the incorrect models. Maker simply duplicates these predictions to the new outputs. If I remove the GFF file from the ctl file I get new predictions, that have the necessary corrections but they now have unfriendly names. Do you have any suggestions on how to associate the old names with the new predictions? I could simple blast the old proteins vs the new ones and associate them in that manor but I was wondering if there were any other options within Maker. >> >> Since I have the GFF files I also have the associated transcripts and proteins. >> Do I need to do some iteration of est2/genome then generate a new model gff file? >> >> The issue we are dealing with is thousands of short introns in our gff file. These are less than 20 bp and are not biologically feasible so we are trying to correct the gene model predictions. >> >> Cheers >> Ian >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.king at rothamsted.ac.uk Wed Apr 16 05:27:09 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Wed, 16 Apr 2014 11:27:09 +0000 Subject: [maker-devel] scalar text in maker transcripts Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7C8DAC@rothex1.rothamsted.ac.uk> Hi, I've got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 16 15:56:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Apr 2014 15:56:25 -0600 Subject: [maker-devel] scalar text in maker transcripts Message-ID: The only time I have seen this is when fgenesh is used as a predictor and correct_est_fusion=1 is set (it was a bug in trimming long UTR's on fgenesh models). Is that how you have your job configured? If so, that particular bug is fixed in the current MAKER release. Thanks, Carson From: "Robert King (RRes-Roth)" Date: Wednesday, April 16, 2014 at 5:27 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] scalar text in maker transcripts Hi, I?ve got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner , and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.king at rothamsted.ac.uk Wed Apr 16 15:57:44 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Wed, 16 Apr 2014 21:57:44 +0000 Subject: [maker-devel] scalar text in maker transcripts In-Reply-To: <26314411-75c8-484f-9fbf-413e37d1c706@ROTHEX1.rothamsted.ac.uk> References: <26314411-75c8-484f-9fbf-413e37d1c706@ROTHEX1.rothamsted.ac.uk> Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7C8E85@rothex1.rothamsted.ac.uk> Yep I am. I?ll try upgrading. Thanks Rob From: Carson Holt [mailto:carsonhh at gmail.com] Sent: 16 April 2014 22:56 To: Robert King (RRes-Roth); maker-devel at yandell-lab.org Subject: Re: [maker-devel] scalar text in maker transcripts The only time I have seen this is when fgenesh is used as a predictor and correct_est_fusion=1 is set (it was a bug in trimming long UTR's on fgenesh models). Is that how you have your job configured? If so, that particular bug is fixed in the current MAKER release. Thanks, Carson From: "Robert King (RRes-Roth)" > Date: Wednesday, April 16, 2014 at 5:27 AM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] scalar text in maker transcripts Hi, I?ve got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From muriel.grosb at gmail.com Mon Apr 7 06:29:42 2014 From: muriel.grosb at gmail.com (Muriel Gros-Balthazard) Date: Mon, 7 Apr 2014 14:29:42 +0200 Subject: [maker-devel] Help for Repeat Library Construction Message-ID: <474C2DF8-B5DF-424B-BCF7-EC64BC23EEDC@gmail.com> Hello, I am working on the annotation of the date palm genome using the MAKER pipeline. I started by following the manual for Repeat Library Construction - Advanced. I am stuck in 2.1.3. Indeed, I should use muscle to filter. But I don?t understand what is the file flankingseqfile. How can I obtain it ? Also, do you hava more information about 2.1.4 and 2.1.5 ? Thanks a lot for this great pipeline and for your help, Muriel Gros-Balthazard From Brian.Mack at ARS.USDA.GOV Thu Apr 17 14:34:21 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Thu, 17 Apr 2014 20:34:21 +0000 Subject: [maker-devel] tbl2asn errors Message-ID: Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 - 441931, but the sqn file will have 440868 - 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 17 14:59:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Apr 2014 14:59:05 -0600 Subject: [maker-devel] tbl2asn errors Message-ID: The only one that may be a real error is the first one (I'm not sure what it means). You probably need to find them and open them in a viewer like apollo. The rest I would consider warnings (the NCBI tool doesn't like any weirdness or uncertainty). You often have to manually edit things to get NCBI to accept all models without complaining (sometimes even going against real biology). I know some groups use the always_complete=1 option in MAKER to force start and stop codons into every model for example (even though those forced codons are probably false). *Not sure about this one --> 4 ERROR: SEQ_FEAT.BadTrailingCharacter *These are partial genes with no stop (usually happen at the edge of contigs or near strings of NNNN) --> 217 ERROR: SEQ_FEAT.NoStop *These are just short introns (intron size is under control of the ab initio predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron *These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR: SEQ_FEAT.StartCodon *These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR: SEQ_INST.BadProteinStart *Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) --> 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor *Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) --> 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor *These are just short exons (exon size is under control of the ab initio predictors) --> 118 WARNING: SEQ_FEAT.ShortExon You probably need to identify examples of models causing each issue, and then look at the in Apollo. Apollo lets you open tbl format and save back to it. I imagine the coordinate change is from NCBI using a 0 based coordinate system as opposed to a 1 based system (I.e. first base is 0 rather than 1). Unfortunately getting everything to go into NCBI is usually a grueling task. --Carson From: "Mack, Brian" Date: Thursday, April 17, 2014 at 2:34 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] tbl2asn errors Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475 ). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Geib at ARS.USDA.GOV Thu Apr 17 14:59:22 2014 From: Scott.Geib at ARS.USDA.GOV (Geib, Scott) Date: Thu, 17 Apr 2014 20:59:22 +0000 Subject: [maker-devel] tbl2asn errors In-Reply-To: References: Message-ID: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian > Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " > Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 17 15:27:53 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Apr 2014 15:27:53 -0600 Subject: [maker-devel] tbl2asn errors In-Reply-To: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> References: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Message-ID: Very cool. I'll try it out as well. --Carson From: "Geib, Scott" Date: Thursday, April 17, 2014 at 2:59 PM To: "Mack, Brian" , "maker-devel at yandell-lab.org" , "Brian Hall (bhall7 at hawaii.edu)" Subject: Re: [maker-devel] tbl2asn errors Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475 ). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Geib at ARS.USDA.GOV Thu Apr 17 16:37:49 2014 From: Scott.Geib at ARS.USDA.GOV (Geib, Scott) Date: Thu, 17 Apr 2014 22:37:49 +0000 Subject: [maker-devel] tbl2asn errors In-Reply-To: References: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Message-ID: <0D54878997A4B9478F03938D61DB51D4266C1E@001FSN2MPN1-015.001f.mgd2.msft.net> Just so not to be discouraged, current version has limited functionality and is pretty much un-documented (although will write a .tbl file). Will email the list when first real release is complete and documented. Scott From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, April 17, 2014 11:28 AM To: Geib, Scott; Mack, Brian; maker-devel at yandell-lab.org; Brian Hall (bhall7 at hawaii.edu) Subject: Re: [maker-devel] tbl2asn errors Very cool. I'll try it out as well. --Carson From: "Geib, Scott" > Date: Thursday, April 17, 2014 at 2:59 PM To: "Mack, Brian" >, "maker-devel at yandell-lab.org" >, "Brian Hall (bhall7 at hawaii.edu)" > Subject: Re: [maker-devel] tbl2asn errors Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian > Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " > Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Fri Apr 18 07:14:45 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Fri, 18 Apr 2014 09:14:45 -0400 Subject: [maker-devel] Short Introns Message-ID: Hello, We are preparing two submission for NCBI, nightmare. However some of our MAKER gene models have short introns that are being flagged by NCBI. In one species we have >400 introns smaller then 20bp which is almost biologically impossible. I know we can set max intron length in the opts.ctl file but can we set a minimum intron length? I saw yesterdays posts that mention this is a result of the external ab initio predictors but I didn?t see an indication as to which predictor and how to change that setting. from yesterday: *These are just short introns (intron size is under control of the ab initio predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron Cheers Ian From carsonhh at gmail.com Fri Apr 18 09:35:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 18 Apr 2014 09:35:51 -0600 Subject: [maker-devel] Short Introns In-Reply-To: References: Message-ID: Look at the name of those genes. The original name will let you know where it came from because it will contain, augustus, genemark, snap, etc. You will also want to open up the contig containing those geens in a viewer like apollo (http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz). See if the short intron is part of the CDS or UTR. If it's UTR then, it has evidence support from an EST, which either means there are problems with the EST/cDNA evidence or it's real. For those, even if they are real you can just trim them off. If it's part of the CDS, then investigate whether it is suggested by EST or protein evidence, or if the ab initio predictor called it (sometime the ab initio predictor calls things to force an ORF to work). This can sometimes be indicative of assembly issues in that region. --Carson On 4/18/14, 7:14 AM, "UMD Bioinformatics" wrote: >Hello, > >We are preparing two submission for NCBI, nightmare. However some of our >MAKER gene models have short introns that are being flagged by NCBI. In >one species we have >400 introns smaller then 20bp which is almost >biologically impossible. I know we can set max intron length in the >opts.ctl file but can we set a minimum intron length? > >I saw yesterdays posts that mention this is a result of the external ab >initio predictors but I didn?t see an indication as to which predictor >and how to change that setting. > >from yesterday: >*These are just short introns (intron size is under control of the ab >initio >predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron > >Cheers >Ian > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.seidl at wur.nl Tue Apr 22 08:27:18 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:27:18 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Message-ID: Hi, I have a question on the post-processing of my maker output. I finished a maker run on a draft genome (231 scaffolds) without an error. To get a merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. However, I realized that I contains next to gff3 conform output, thousands of lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced scaffolds is correct, however I have my doubts if I successfully retrieved all annotations...Could you maybe point me to a possible solution... Thanks in advance Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:31:16 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:31:16 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: Message-ID: I've never seen this. What version of MAKER are you using? --Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:27 AM To: Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi, I have a question on the post-processing of my maker output. I finished a maker run on a draft genome (231 scaffolds) without an error. To get a merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. However, I realized that I contains next to gff3 conform output, thousands of lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced scaffolds is correct, however I have my doubts if I successfully retrieved all annotations...Could you maybe point me to a possible solution... Thanks in advance Michael _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.seidl at wur.nl Tue Apr 22 08:37:33 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:37:33 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Hi Carson, I am using maker 2.31. Thanks Michael On Tue, Apr 22, 2014 at 4:31 PM, Carson Holt wrote: > I've never seen this. What version of MAKER are you using? > > --Carson > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:27 AM > To: > > Subject: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi, > > I have a question on the post-processing of my maker output. I finished a > maker run on a draft genome (231 scaffolds) without an error. To get a > merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d > master_datastore_index.log. However, I realized that I contains next to > gff3 conform output, thousands of lines of array refs, e.g. > ARRAY(0x188a8578)). The total number of produced scaffolds is correct, > however I have my doubts if I successfully retrieved all > annotations...Could you maybe point me to a possible solution... > > Thanks in advance > Michael > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- *Michael F Seidl, PhD* Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:39:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:39:51 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Could you check the individual contig GFF3's before merge. Do any of those contain array refs? Also is it exactly 2.31 or the current 2.31.3? --Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi Carson, I am using maker 2.31. Thanks Michael On Tue, Apr 22, 2014 at 4:31 PM, Carson Holt wrote: > I've never seen this. What version of MAKER are you using? > > --Carson > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:27 AM > To: > > Subject: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi, > > I have a question on the post-processing of my maker output. I finished a > maker run on a draft genome (231 scaffolds) without an error. To get a merged > gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. > However, I realized that I contains next to gff3 conform output, thousands of > lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced > scaffolds is correct, however I have my doubts if I successfully retrieved all > annotations...Could you maybe point me to a possible solution... > > Thanks in advance > Michael > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.seidl at wur.nl Tue Apr 22 08:43:44 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:43:44 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt wrote: > any Dear Carson, maker -version returns 2.31. Yes, also the individual scaffolds seem to contain ARRAY refs, e.g. find -name "*gff" | xargs grep "ARRAY": ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 Cheers M -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:46:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:46:34 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:43 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt wrote: > any Dear Carson, maker -version returns 2.31. Yes, also the individual scaffolds seem to contain ARRAY refs, e.g. find -name "*gff" | xargs grep "ARRAY": ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 Cheers M -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Tue Apr 22 11:45:45 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Tue, 22 Apr 2014 23:15:45 +0530 Subject: [maker-devel] is using est_reads option safe? Message-ID: Hi, I need to run MAKER against a genome with both raw (FASTQ) and assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using est= options in maker_opts.ctl. Looking for how to point MAKER to the raw reads I came across this thread https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where Dr. Carlson Holt points out that est_gff should be used. However, from MAKER's run log it seems that est_reads option is not deprecated, just hidden from plain sight by excluding it from maker_opts.ctl. So I set est_reads option in maker_opts.ctl and MAKER parses the control files and runs just fine. Now I am left wondering if it's safe to use est_reads. As in, could it impact the predicted set negatively? -- Priyam From carsonhh at gmail.com Tue Apr 22 12:02:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 12:02:56 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: The est_reads option doesn't do anything. It in the run log for backwards compatibility with old jobs because MAKER has a restart capability (i.e. people can rerun new MAKER versions against old MAKER output in the same directory - it can reuse old raw results to avoid rerunning analysis steps). The est_reads was originally there for developer experimentation, but then it went away. You need to use an external tool like tophat and cufflinks to align short reads and assemble them into likely exon blocks (i.e. the GFF3 passthrough option you mentioned). Or you can assemble then without alignment using something like trinity (then you can provide that result to the est= options because it will be in fasta format). You should not use raw reads directly with MAKER, you need to preprocess them using one of the methods mentioned for them to be useful. Thanks, Carson On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >Hi, > >I need to run MAKER against a genome with both raw (FASTQ) and >assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >est= options in maker_opts.ctl. Looking for how to point MAKER to the >raw reads I came across this thread >https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >Dr. Carlson Holt points out that est_gff should be used. However, from >MAKER's run log it seems that est_reads option is not deprecated, just >hidden from plain sight by excluding it from maker_opts.ctl. So I set >est_reads option in maker_opts.ctl and MAKER parses the control files >and runs just fine. > >Now I am left wondering if it's safe to use est_reads. As in, could it >impact the predicted set negatively? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Apr 22 13:10:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 13:10:46 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> <155dca02dbb84844930703f598f57635@SCOMP0939.wurnet.nl> Message-ID: The issue was indeed caused by a bug in using the other_gff= file option. Could you place the attached file in .../maker/lib/. Then you can rerun maker to test if it fixes it ('maker -a' for fast rerun without analysis rerun). Alternately if you don't feel like rerunning everything, you can also filter out the lines using --> grep -v "ARRAY" file.gff Since the other_gff file is not used in any part of the analysis and is just a convenience option that prints any text given to it into the final GFF3 file, then filtering them out is the same as if you would have left other_gff blank when running MAKER. You can then use 'gff3_merge -s tophat.gff merged_genome.gff' to merge the desired extra lines back into your file outside of MAKER. Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 12:29 PM To: Carson Holt Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi Carson, I uploaded the files as an archive. Thanks Michael On Tue, Apr 22, 2014 at 5:04 PM, Carson Holt wrote: > In the base maker.output directory for the job, there will be a file with a > .db extension. Could you send that as well? I'm leaning towards this being > something odd happening with the GFF3 files used as input. Particularly the > other_gff= file. Could you upload this file as well --> > /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/toph > at.gff3. > > --Carson > > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:56 AM > To: Carson Holt > > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Should be uploading right now... > > Thanks Michael > > > > On Tue, Apr 22, 2014 at 4:46 PM, Carson Holt > > wrote: > Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it > here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > Thanks, > Carson > > > From: Michael Seidl > >> > Date: Tuesday, April 22, 2014 at 8:43 AM > To: Carson Holt > o:carsonhh at gmail.com>>> > Cc: > "maker-devel at yandell-lab.org devel at yandell-lab.org>" > devel at yandell-lab.org>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > > On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt > o:carsonhh at gmail.com>>> wrote: > any > > Dear Carson, > > maker -version returns 2.31. Yes, also the individual scaffolds seem to > contain ARRAY refs, e.g. > find -name "*gff" | xargs grep "ARRAY": > > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 > > Cheers > M > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: michael.seidl at wur.nl > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GFFDB.pm Type: text/x-perl-script Size: 52153 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 22 14:35:31 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 14:35:31 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> <155dca02dbb84844930703f598f57635@SCOMP0939.wurnet.nl> Message-ID: You can provide a comma separated list of files to est_gff. Also from experience cufflinks gives far better results than tophat. Tophat tends to have a lot of false positives that adversely affect the overall quality of gene models, so I usually recommend that people use cufflinks output and not even include the tophat results in their run. Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 2:30 PM To: Carson Holt Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Dear Carson, thanks a lot I will try. More importantly, you pointed me to a mistake in my procedure which will make me rerun the maker anyway :p I want maker to use the tophat.gff next to cufflinks est (fa + gff) as well as a protein.fa. I provide them currently as follows: #-----EST Evidence (for best results provide a file for at least one) est= /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/tr anscripts.cds.fa #set of ESTs or assembled mRNA-seq altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/tr anscripts.gff3 #aligned ESTs or mRNA-seq from a altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= /home/michael/data/side/alternaria/maker_annotation/fungal_proteins.fa #protein sequence file in fasta format (i.e. from mu protein_gff= #aligned protein homology evidence from an external GFF3 file Can I give the tophat.gff as a alttest.gff or is maker internally using est_gff and altest_gff differently? Sorry for this question, but I did not yet realized that the other_gff will be omitted during maker Thanks a lot Michael On Tue, Apr 22, 2014 at 9:10 PM, Carson Holt wrote: > The issue was indeed caused by a bug in using the other_gff= file option. > Could you place the attached file in .../maker/lib/. Then you can rerun maker > to test if it fixes it ('maker -a' for fast rerun without analysis rerun). > > Alternately if you don't feel like rerunning everything, you can also filter > out the lines using --> grep -v "ARRAY" file.gff > > Since the other_gff file is not used in any part of the analysis and is just a > convenience option that prints any text given to it into the final GFF3 file, > then filtering them out is the same as if you would have left other_gff blank > when running MAKER. You can then use 'gff3_merge -s tophat.gff > merged_genome.gff' to merge the desired extra lines back into your file > outside of MAKER. > > Thanks, > Carson > > > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 12:29 PM > To: Carson Holt > > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi Carson, > > I uploaded the files as an archive. > > Thanks > Michael > > > On Tue, Apr 22, 2014 at 5:04 PM, Carson Holt > > wrote: > In the base maker.output directory for the job, there will be a file with a > .db extension. Could you send that as well? I'm leaning towards this being > something odd happening with the GFF3 files used as input. Particularly the > other_gff= file. Could you upload this file as well --> > /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/toph > at.gff3. > > --Carson > > > From: Michael Seidl > >> > Date: Tuesday, April 22, 2014 at 8:56 AM > To: Carson Holt > o:carsonhh at gmail.com>>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Should be uploading right now... > > Thanks Michael > > > > On Tue, Apr 22, 2014 at 4:46 PM, Carson Holt > o:carsonhh at gmail.com>>> wrote: > Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it > here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > Thanks, > Carson > > > From: Michael Seidl > > l at wur.nl>>>> > Date: Tuesday, April 22, 2014 at 8:43 AM > To: Carson Holt > o:carsonhh at gmail.com>> ilto:carsonhh at gmail.com>>> > Cc: > "maker-devel at yandell-lab.org devel at yandell-lab.org> yandell-lab.org -lab.org>>" > devel at yandell-lab.org> yandell-lab.org -lab.org>>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > > On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt > o:carsonhh at gmail.com>> ilto:carsonhh at gmail.com>>> wrote: > any > > Dear Carson, > > maker -version returns 2.31. Yes, also the individual scaffolds seem to > contain ARRAY refs, e.g. > find -name "*gff" | xargs grep "ARRAY": > > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 > > Cheers > M > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: > michael.seidl at wur.nl mailto:michael.seidl at wur.nl>> > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: michael.seidl at wur.nl > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Wed Apr 23 03:55:37 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Wed, 23 Apr 2014 15:25:37 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: Thanks, Carson. I now understand that I shouldn't use est_reds options. Does MAKER utilise est_gff for prediction or simply passes the annotations through to the output GFF? In that case how is it different from using other_gff / model_gff (what's the difference between these two?) I have both assembled and raw reads. Is it sufficient to just use the assembled set? -- Priyam On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: > The est_reads option doesn't do anything. It in the run log for backwards > compatibility with old jobs because MAKER has a restart capability (i.e. > people can rerun new MAKER versions against old MAKER output in the same > directory - it can reuse old raw results to avoid rerunning analysis > steps). The est_reads was originally there for developer experimentation, > but then it went away. > > You need to use an external tool like tophat and cufflinks to align short > reads and assemble them into likely exon blocks (i.e. the GFF3 passthrough > option you mentioned). Or you can assemble then without alignment using > something like trinity (then you can provide that result to the est= > options because it will be in fasta format). > > You should not use raw reads directly with MAKER, you need to preprocess > them using one of the methods mentioned for them to be useful. > > Thanks, > Carson > > > > On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: > >>Hi, >> >>I need to run MAKER against a genome with both raw (FASTQ) and >>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>raw reads I came across this thread >>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>Dr. Carlson Holt points out that est_gff should be used. However, from >>MAKER's run log it seems that est_reads option is not deprecated, just >>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>est_reads option in maker_opts.ctl and MAKER parses the control files >>and runs just fine. >> >>Now I am left wondering if it's safe to use est_reads. As in, could it >>impact the predicted set negatively? >> >>-- Priyam >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Wed Apr 23 08:43:54 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 23 Apr 2014 08:43:54 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: est_gff is the equivalent of est=, but because the alignment structure is already in the GFF3, I don't need to align sequence with blastn/exonerate. model_gff and pred_gff are essentially the same with the difference being that model_gff can be kept in the final results even without supporting evidence, but pred_gff won't. Pred_gff needs evidence support because it is a potential model, where model_gff is considered a known model even if the structure of that model may be uncertain. other_gff is just a convenience method for passing through GFF3 features to the final result. It's impossible to have MAKER be aware of every kind of possible entry, so if you have something more exotic in the final output (sequence variant information, alternate alleles, promotor and methylation site, etc.) then you can pass it in there and it will just be printed into the file. It's basically the equivalent of concatenating two GFF3 files together, but it handles the proper reordering of sequence information at the end of the GFF3 file (because technically you can't just concatenate GFF3 files end-to-end). You can also use the gff3_merge tool that comes with MAKER to get the same effect. --Carson On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >Thanks, Carson. > >I now understand that I shouldn't use est_reds options. > >Does MAKER utilise est_gff for prediction or simply passes the >annotations through to the output GFF? In that case how is it >different from using other_gff / model_gff (what's the difference >between these two?) > >I have both assembled and raw reads. Is it sufficient to just use the >assembled set? > >-- Priyam > >On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: >> The est_reads option doesn't do anything. It in the run log for >>backwards >> compatibility with old jobs because MAKER has a restart capability (i.e. >> people can rerun new MAKER versions against old MAKER output in the same >> directory - it can reuse old raw results to avoid rerunning analysis >> steps). The est_reads was originally there for developer >>experimentation, >> but then it went away. >> >> You need to use an external tool like tophat and cufflinks to align >>short >> reads and assemble them into likely exon blocks (i.e. the GFF3 >>passthrough >> option you mentioned). Or you can assemble then without alignment using >> something like trinity (then you can provide that result to the est= >> options because it will be in fasta format). >> >> You should not use raw reads directly with MAKER, you need to preprocess >> them using one of the methods mentioned for them to be useful. >> >> Thanks, >> Carson >> >> >> >> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >> >>>Hi, >>> >>>I need to run MAKER against a genome with both raw (FASTQ) and >>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>raw reads I came across this thread >>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>MAKER's run log it seems that est_reads option is not deprecated, just >>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>and runs just fine. >>> >>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>impact the predicted set negatively? >>> >>>-- Priyam >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> From kdelmore at zoology.ubc.ca Tue Apr 22 22:48:08 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 22 Apr 2014 21:48:08 -0700 Subject: [maker-devel] problem with dsindex Message-ID: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> I am having some trouble with the dsindex tool. I used the fasta_tool to split my original multifasta file and ran maker with the ?base and ?g flags. I then used the dsindex tool to summarize results from each fasta. The tool finished without an error message and pointed me to where the files should be but when I went to that directory there was no datastore and the index.log said that it had started on each of the fastas but not finished. I got around this problem using gff3_merge by using the ?o option and providing paths to the gff files but this is not working with the fasta_merge tool. I don?t want to just cat the files together because I want to be sure the merged gff and protein.fasta files are the same for downstream annotation steps. I?ve included examples of the commands I used below and the output from dsindex. Note that the individual fastas finished without errors and produced datastores. I would really appreciate any input you might have with this problem and THANK YOU for developing such a user friendly pipeline. /maker/bin/fasta_tool --split placed.fasta mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides maker/bin/maker -dsindex -fix_nucleotides STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /placed.maker.output/placed_datastore ##this directory was not generated To access files for individual sequences use the datastore index: /placed.maker.output/placed_master_datastore_index.log /maker/bin/gff3_merge -o placed.gff * /maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta 2.maker.proteins.fasta ##this did not work From carson.holt at genetics.utah.edu Wed Apr 23 08:51:59 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 23 Apr 2014 14:51:59 +0000 Subject: [maker-devel] problem with dsindex In-Reply-To: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> References: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> Message-ID: I don't think all your contigs are finished or you did not supply the -base tag when running -dsindex. If it says STARTED rather than FINISHED, then the output files for that contig are missing from the directory it is looking at. For example this is how you should be running everything --> /maker/bin/fasta_tool --split placed.fasta mpiexec -n 4 /maker/bin/maker -base placed -g 1.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 2.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 3.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 4.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 5.fasta -fix_nucleotides Now all will write to placed.maker.output Then you need to do this--> maker/bin/maker -dsindex -base placed -g placed.fasta Then it will rebuild the index for placed.maker.output/placed_master_datastore_index.log Thanks, Carson On 4/22/14, 10:48 PM, "kdelmore at zoology.ubc.ca" wrote: >I am having some trouble with the dsindex tool. I used the fasta_tool to >split my original multifasta file and ran maker with the ?base and ?g >flags. I then used the dsindex tool to summarize results from each fasta. >The tool finished without an error message and pointed me to where the >files should be but when I went to that directory there was no datastore >and the index.log said that it had started on each of the fastas but not >finished. I got around this problem using gff3_merge by using the ?o >option and providing paths to the gff files but this is not working with >the fasta_merge tool. I don?t want to just cat the files together because >I want to be sure the merged gff and protein.fasta files are the same for >downstream annotation steps. I?ve included examples of the commands I used >below and the output from dsindex. Note that the individual fastas >finished without errors and produced datastores. > >I would really appreciate any input you might have with this problem and >THANK YOU for developing such a user friendly pipeline. > >/maker/bin/fasta_tool --split placed.fasta > >mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides > >maker/bin/maker -dsindex -fix_nucleotides >STATUS: Parsing control files... >STATUS: Processing and indexing input FASTA files... >STATUS: Setting up database for any GFF3 input... >A data structure will be created for you at: >/placed.maker.output/placed_datastore ##this directory was not generated >To access files for individual sequences use the datastore index: >/placed.maker.output/placed_master_datastore_index.log > >/maker/bin/gff3_merge -o placed.gff * > >/maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta >2.maker.proteins.fasta ##this did not work > > > From carsonhh at gmail.com Wed Apr 23 08:57:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 23 Apr 2014 08:57:34 -0600 Subject: [maker-devel] problem with dsindex In-Reply-To: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> References: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> Message-ID: Also fasta_merge works differently than gff3_merge. It requires the datastore index because it is trying to find directories and the 'type' and 'group' the fasta files in those directories. Without the datastore index, it is the equivalent of 'cat file1.fa file2.fa > file3.fa'. It also requires the '-i' flag when specifying individual fasta files. --Carson On 4/22/14, 10:48 PM, "kdelmore at zoology.ubc.ca" wrote: >I am having some trouble with the dsindex tool. I used the fasta_tool to >split my original multifasta file and ran maker with the ?base and ?g >flags. I then used the dsindex tool to summarize results from each fasta. >The tool finished without an error message and pointed me to where the >files should be but when I went to that directory there was no datastore >and the index.log said that it had started on each of the fastas but not >finished. I got around this problem using gff3_merge by using the ?o >option and providing paths to the gff files but this is not working with >the fasta_merge tool. I don?t want to just cat the files together because >I want to be sure the merged gff and protein.fasta files are the same for >downstream annotation steps. I?ve included examples of the commands I used >below and the output from dsindex. Note that the individual fastas >finished without errors and produced datastores. > >I would really appreciate any input you might have with this problem and >THANK YOU for developing such a user friendly pipeline. > >/maker/bin/fasta_tool --split placed.fasta > >mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides > >maker/bin/maker -dsindex -fix_nucleotides >STATUS: Parsing control files... >STATUS: Processing and indexing input FASTA files... >STATUS: Setting up database for any GFF3 input... >A data structure will be created for you at: >/placed.maker.output/placed_datastore ##this directory was not generated >To access files for individual sequences use the datastore index: >/placed.maker.output/placed_master_datastore_index.log > >/maker/bin/gff3_merge -o placed.gff * > >/maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta >2.maker.proteins.fasta ##this did not work > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From a.priyam at qmul.ac.uk Thu Apr 24 01:28:38 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Thu, 24 Apr 2014 12:58:38 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: You say est_gff is the equivalent of est= (except that alignment structure is a part of gff). What would MAKER do if I set both est= and est_gff= options in maker_opts.ctl? Will it ignore est=? -- Priyam On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: > est_gff is the equivalent of est=, but because the alignment structure is > already in the GFF3, I don't need to align sequence with blastn/exonerate. > model_gff and pred_gff are essentially the same with the difference being > that model_gff can be kept in the final results even without supporting > evidence, but pred_gff won't. Pred_gff needs evidence support because it > is a potential model, where model_gff is considered a known model even if > the structure of that model may be uncertain. > > other_gff is just a convenience method for passing through GFF3 features > to the final result. It's impossible to have MAKER be aware of every kind > of possible entry, so if you have something more exotic in the final > output (sequence variant information, alternate alleles, promotor and > methylation site, etc.) then you can pass it in there and it will just be > printed into the file. It's basically the equivalent of concatenating two > GFF3 files together, but it handles the proper reordering of sequence > information at the end of the GFF3 file (because technically you can't > just concatenate GFF3 files end-to-end). You can also use the gff3_merge > tool that comes with MAKER to get the same effect. > > --Carson > > > > On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: > >>Thanks, Carson. >> >>I now understand that I shouldn't use est_reds options. >> >>Does MAKER utilise est_gff for prediction or simply passes the >>annotations through to the output GFF? In that case how is it >>different from using other_gff / model_gff (what's the difference >>between these two?) >> >>I have both assembled and raw reads. Is it sufficient to just use the >>assembled set? >> >>-- Priyam >> >>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: >>> The est_reads option doesn't do anything. It in the run log for >>>backwards >>> compatibility with old jobs because MAKER has a restart capability (i.e. >>> people can rerun new MAKER versions against old MAKER output in the same >>> directory - it can reuse old raw results to avoid rerunning analysis >>> steps). The est_reads was originally there for developer >>>experimentation, >>> but then it went away. >>> >>> You need to use an external tool like tophat and cufflinks to align >>>short >>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>passthrough >>> option you mentioned). Or you can assemble then without alignment using >>> something like trinity (then you can provide that result to the est= >>> options because it will be in fasta format). >>> >>> You should not use raw reads directly with MAKER, you need to preprocess >>> them using one of the methods mentioned for them to be useful. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>> >>>>Hi, >>>> >>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>raw reads I came across this thread >>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>and runs just fine. >>>> >>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>impact the predicted set negatively? >>>> >>>>-- Priyam >>>> >>>>_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> > > From carsonhh at gmail.com Thu Apr 24 08:15:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 24 Apr 2014 08:15:07 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: It will use both. you can also provide multiple files to either using comma separated lists. --Carson On 4/24/14, 1:28 AM, "Anurag Priyam" wrote: >You say est_gff is the equivalent of est= (except that alignment >structure is a part of gff). What would MAKER do if I set both est= >and est_gff= options in maker_opts.ctl? Will it ignore est=? > >-- Priyam > >On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: >> est_gff is the equivalent of est=, but because the alignment structure >>is >> already in the GFF3, I don't need to align sequence with >>blastn/exonerate. >> model_gff and pred_gff are essentially the same with the difference >>being >> that model_gff can be kept in the final results even without supporting >> evidence, but pred_gff won't. Pred_gff needs evidence support because >>it >> is a potential model, where model_gff is considered a known model even >>if >> the structure of that model may be uncertain. >> >> other_gff is just a convenience method for passing through GFF3 features >> to the final result. It's impossible to have MAKER be aware of every >>kind >> of possible entry, so if you have something more exotic in the final >> output (sequence variant information, alternate alleles, promotor and >> methylation site, etc.) then you can pass it in there and it will just >>be >> printed into the file. It's basically the equivalent of concatenating >>two >> GFF3 files together, but it handles the proper reordering of sequence >> information at the end of the GFF3 file (because technically you can't >> just concatenate GFF3 files end-to-end). You can also use the >>gff3_merge >> tool that comes with MAKER to get the same effect. >> >> --Carson >> >> >> >> On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >> >>>Thanks, Carson. >>> >>>I now understand that I shouldn't use est_reds options. >>> >>>Does MAKER utilise est_gff for prediction or simply passes the >>>annotations through to the output GFF? In that case how is it >>>different from using other_gff / model_gff (what's the difference >>>between these two?) >>> >>>I have both assembled and raw reads. Is it sufficient to just use the >>>assembled set? >>> >>>-- Priyam >>> >>>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt >>>wrote: >>>> The est_reads option doesn't do anything. It in the run log for >>>>backwards >>>> compatibility with old jobs because MAKER has a restart capability >>>>(i.e. >>>> people can rerun new MAKER versions against old MAKER output in the >>>>same >>>> directory - it can reuse old raw results to avoid rerunning analysis >>>> steps). The est_reads was originally there for developer >>>>experimentation, >>>> but then it went away. >>>> >>>> You need to use an external tool like tophat and cufflinks to align >>>>short >>>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>>passthrough >>>> option you mentioned). Or you can assemble then without alignment >>>>using >>>> something like trinity (then you can provide that result to the est= >>>> options because it will be in fasta format). >>>> >>>> You should not use raw reads directly with MAKER, you need to >>>>preprocess >>>> them using one of the methods mentioned for them to be useful. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>>> >>>>>Hi, >>>>> >>>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>>raw reads I came across this thread >>>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>>and runs just fine. >>>>> >>>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>>impact the predicted set negatively? >>>>> >>>>>-- Priyam >>>>> >>>>>_______________________________________________ >>>>>maker-devel mailing list >>>>>maker-devel at box290.bluehost.com >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >> >> From anurag08priyam at gmail.com Thu Apr 24 08:26:24 2014 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Thu, 24 Apr 2014 19:56:24 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: That answers all my questions. Thanks, Carson. -- Priyam On Thu, Apr 24, 2014 at 7:45 PM, Carson Holt wrote: > It will use both. you can also provide multiple files to either using > comma separated lists. > > --Carson > > > On 4/24/14, 1:28 AM, "Anurag Priyam" wrote: > >>You say est_gff is the equivalent of est= (except that alignment >>structure is a part of gff). What would MAKER do if I set both est= >>and est_gff= options in maker_opts.ctl? Will it ignore est=? >> >>-- Priyam >> >>On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: >>> est_gff is the equivalent of est=, but because the alignment structure >>>is >>> already in the GFF3, I don't need to align sequence with >>>blastn/exonerate. >>> model_gff and pred_gff are essentially the same with the difference >>>being >>> that model_gff can be kept in the final results even without supporting >>> evidence, but pred_gff won't. Pred_gff needs evidence support because >>>it >>> is a potential model, where model_gff is considered a known model even >>>if >>> the structure of that model may be uncertain. >>> >>> other_gff is just a convenience method for passing through GFF3 features >>> to the final result. It's impossible to have MAKER be aware of every >>>kind >>> of possible entry, so if you have something more exotic in the final >>> output (sequence variant information, alternate alleles, promotor and >>> methylation site, etc.) then you can pass it in there and it will just >>>be >>> printed into the file. It's basically the equivalent of concatenating >>>two >>> GFF3 files together, but it handles the proper reordering of sequence >>> information at the end of the GFF3 file (because technically you can't >>> just concatenate GFF3 files end-to-end). You can also use the >>>gff3_merge >>> tool that comes with MAKER to get the same effect. >>> >>> --Carson >>> >>> >>> >>> On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >>> >>>>Thanks, Carson. >>>> >>>>I now understand that I shouldn't use est_reds options. >>>> >>>>Does MAKER utilise est_gff for prediction or simply passes the >>>>annotations through to the output GFF? In that case how is it >>>>different from using other_gff / model_gff (what's the difference >>>>between these two?) >>>> >>>>I have both assembled and raw reads. Is it sufficient to just use the >>>>assembled set? >>>> >>>>-- Priyam >>>> >>>>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt >>>>wrote: >>>>> The est_reads option doesn't do anything. It in the run log for >>>>>backwards >>>>> compatibility with old jobs because MAKER has a restart capability >>>>>(i.e. >>>>> people can rerun new MAKER versions against old MAKER output in the >>>>>same >>>>> directory - it can reuse old raw results to avoid rerunning analysis >>>>> steps). The est_reads was originally there for developer >>>>>experimentation, >>>>> but then it went away. >>>>> >>>>> You need to use an external tool like tophat and cufflinks to align >>>>>short >>>>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>>>passthrough >>>>> option you mentioned). Or you can assemble then without alignment >>>>>using >>>>> something like trinity (then you can provide that result to the est= >>>>> options because it will be in fasta format). >>>>> >>>>> You should not use raw reads directly with MAKER, you need to >>>>>preprocess >>>>> them using one of the methods mentioned for them to be useful. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>>>> >>>>>>Hi, >>>>>> >>>>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>>>raw reads I came across this thread >>>>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>>>and runs just fine. >>>>>> >>>>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>>>impact the predicted set negatively? >>>>>> >>>>>>-- Priyam >>>>>> >>>>>>_______________________________________________ >>>>>>maker-devel mailing list >>>>>>maker-devel at box290.bluehost.com >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>>g >>>>> >>>>> >>> >>> > > From matthew.macmanes at unh.edu Sat Apr 26 08:56:25 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Sat, 26 Apr 2014 10:56:25 -0400 Subject: [maker-devel] Use of each() on hash Message-ID: Hello, I am getting a large number of errors, while running maker on my ubuntu server. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI Compiled perl modules using ./build Thanks for any insight anyone may have. __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 09:26:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 09:26:24 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: Message-ID: The message appears to be coming from forks.pm. Probably a warning added to perl 5.18.2 which is really really new (other versions don't care about this), and most developers would not consider 5.18 a fully stable release for production purposes (it will have lots of test features and messages that will get improved or dropped rather quickly). You can try updating the forks module from CPAN. Otherwise I would ignore it, as forks is sufficiently tested to know it works (it's not a MAKER module, it a widely used CPAN module - literally tens of thousands of scripts use it worldwide). The authors of forks.pm will take steps to silence the warning rather quickly, or the warning will be removed from the perl interpreter altogether. Thanks, Carson Sent from my iPhone > On Apr 26, 2014, at 8:56 AM, Matthew MacManes wrote: > > Hello, > > I am getting a large number of errors, while running maker on my ubuntu server. > > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > > It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. > > For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI > > Compiled perl modules using ./build > > Thanks for any insight anyone may have. > > __________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Sat Apr 26 21:34:16 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 27 Apr 2014 03:34:16 +0000 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: Message-ID: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt > wrote: The message appears to be coming from forks.pm. Probably a warning added to perl 5.18.2 which is really really new (other versions don't care about this), and most developers would not consider 5.18 a fully stable release for production purposes (it will have lots of test features and messages that will get improved or dropped rather quickly). You can try updating the forks module from CPAN. Otherwise I would ignore it, as forks is sufficiently tested to know it works (it's not a MAKER module, it a widely used CPAN module - literally tens of thousands of scripts use it worldwide). The authors of forks.pm will take steps to silence the warning rather quickly, or the warning will be removed from the perl interpreter altogether. Thanks, Carson Sent from my iPhone On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: Hello, I am getting a large number of errors, while running maker on my ubuntu server. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI Compiled perl modules using ./build Thanks for any insight anyone may have. __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 22:06:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 22:06:46 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> References: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> Message-ID: Yah, I had already seen that ticket. It's related to changing the function from a while loop to a foreach loop just to suppress the warning. Not sure why the forks.pm maintainer hasn't looked at it, but I imagine he will probably just do something more like --> no warnings qw(each); or whatever would suppress that warning without altering anything else in the code. I wouldn't say 5.18 is a development release. What said is that it's not good for 'production'. The problem is that most system still use 5.10 and 5.12, with a very few only recently moving to 5.16 (amazon's EC2 images for example). So you will find that issues with even very popular CPAN modules (as we see here) will be more common in something like 5.18.X. Not because 5.18 is flawed, or buggy, but because it's not yet used enough to flush out all the secondary issues it can cause elsewhere in wider world of perl. Thanks, Carson From: "Fields, Christopher J" Date: Saturday, April 26, 2014 at 9:34 PM To: Carson Holt Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt wrote: > The message appears to be coming from forks.pm. Probably a warning added to > perl 5.18.2 which is really really new (other versions don't care about this), > and most developers would not consider 5.18 a fully stable release for > production purposes (it will have lots of test features and messages that will > get improved or dropped rather quickly). You can try updating the forks > module from CPAN. Otherwise I would ignore it, as forks is sufficiently > tested to know it works (it's not a MAKER module, it a widely used CPAN module > - literally tens of thousands of scripts use it worldwide). The authors of > forks.pm will take steps to silence the warning rather quickly, or the warning > will be removed from the perl interpreter altogether. > > Thanks, > Carson > > Sent from my iPhone > > On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: > >> Hello, >> >> I am getting a large number of errors, while running maker on my ubuntu >> server. >> >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x2045200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x837200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x9d1200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> >> It is unclear how this effects the results or performance of the software, >> but these errors are repeated thousands of times in even a small run. >> >> For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI >> >> Compiled perl modules using ./build >> >> Thanks for any insight anyone may have. >> >> __________________________________ >> Matthew MacManes, Ph.D. >> University of New Hampshire I Assistant Professor >> Department of Molecular, Cellular, & Biomedical Sciences >> Durham, NH 03824 >> Phone: 603-862-4052 I Twitter: @PeroMHC >> Web: genomebio.org >> Office: 189 Rudman Hall I Lab: 145 Rudman Hall >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 22:51:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 22:51:30 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> Message-ID: If you don't want to wait for the fork.pm maintainer to alter his code and submit an update to CPAN, you should be able to suppress the warning by manually editing forks.pm line 1736 yourself. Change it from this --> $write = each %WRITE; To this (make sure to include the {} brackets)--> { no warnings qw(internal); $write = each %WRITE; } The issue is because the modules author has his code calling 'each', altering the hash, and then calling 'each' again which causes a warning in perl 5.18+. In this case it's relatively innocuous because of how the value and 'each' function are being used (any hash reordering ends up being handled in an outer while loop). Thanks, Carson From: Carson Holt Date: Saturday, April 26, 2014 at 10:06 PM To: "Fields, Christopher J" Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash Yah, I had already seen that ticket. It's related to changing the function from a while loop to a foreach loop just to suppress the warning. Not sure why the forks.pm maintainer hasn't looked at it, but I imagine he will probably just do something more like --> no warnings qw(each); or whatever would suppress that warning without altering anything else in the code. I wouldn't say 5.18 is a development release. What said is that it's not good for 'production'. The problem is that most system still use 5.10 and 5.12, with a very few only recently moving to 5.16 (amazon's EC2 images for example). So you will find that issues with even very popular CPAN modules (as we see here) will be more common in something like 5.18.X. Not because 5.18 is flawed, or buggy, but because it's not yet used enough to flush out all the secondary issues it can cause elsewhere in wider world of perl. Thanks, Carson From: "Fields, Christopher J" Date: Saturday, April 26, 2014 at 9:34 PM To: Carson Holt Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt wrote: > The message appears to be coming from forks.pm. Probably a warning added to > perl 5.18.2 which is really really new (other versions don't care about this), > and most developers would not consider 5.18 a fully stable release for > production purposes (it will have lots of test features and messages that will > get improved or dropped rather quickly). You can try updating the forks > module from CPAN. Otherwise I would ignore it, as forks is sufficiently > tested to know it works (it's not a MAKER module, it a widely used CPAN module > - literally tens of thousands of scripts use it worldwide). The authors of > forks.pm will take steps to silence the warning rather quickly, or the warning > will be removed from the perl interpreter altogether. > > Thanks, > Carson > > Sent from my iPhone > > On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: > >> Hello, >> >> I am getting a large number of errors, while running maker on my ubuntu >> server. >> >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x2045200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x837200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x9d1200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> >> It is unclear how this effects the results or performance of the software, >> but these errors are repeated thousands of times in even a small run. >> >> For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI >> >> Compiled perl modules using ./build >> >> Thanks for any insight anyone may have. >> >> __________________________________ >> Matthew MacManes, Ph.D. >> University of New Hampshire I Assistant Professor >> Department of Molecular, Cellular, & Biomedical Sciences >> Durham, NH 03824 >> Phone: 603-862-4052 I Twitter: @PeroMHC >> Web: genomebio.org >> Office: 189 Rudman Hall I Lab: 145 Rudman Hall >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From muriel.grosb at gmail.com Mon Apr 28 02:35:25 2014 From: muriel.grosb at gmail.com (Muriel Gros-Balthazard) Date: Mon, 28 Apr 2014 10:35:25 +0200 Subject: [maker-devel] Repeat Library Construction : Exclusion of gene fragments Message-ID: <535E12CD.9020302@gmail.com> Hello ! I ran RepeatModeler and seperates the output into ModelerID.lib and Modelerunknown.lib as it is explained in the protocole. In total, I have about 600 sequences in these two files. I now want to exclude gene fragments. I downloaded in UniProtDB all the plant protein sequences and plan to use blastx. However, I don't know which parameter I should use for blastx, especially, the -e value ? Thanks a lot for your help, Muriel GB From mhinsley at ebi.ac.uk Tue Apr 29 02:21:06 2014 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Tue, 29 Apr 2014 09:21:06 +0100 Subject: [maker-devel] unexpected alternate splicing Message-ID: <535F60F2.5050902@ebi.ac.uk> Hi We've just reinstalled maker 2.31 using mpich3 (3.1) and are delighted that file locking and other issues have been resolved. (I'm running maker across several nodes on the compute farm). The maker code is identical: I took the previous tar.gz archive and made a clean build. Using a copy of a previous configuration to test, the only differences I can see is that the location of some files has changed (the working directory is on a different file system) and that I'm using a bigger (unfiltered) repeat library. The previous maker run produced 17393 genes and 17393 mRNAs, and this new version gives 15927 genes and 21328 mRNA. I have alt_splice=0: $ grep splice ../maker_opts.ctl alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no Any idea why I'm getting multiple mRNAs per gene? -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom From carsonhh at gmail.com Tue Apr 29 06:59:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Apr 2014 06:59:04 -0600 Subject: [maker-devel] unexpected alternate splicing In-Reply-To: <535F60F2.5050902@ebi.ac.uk> References: <535F60F2.5050902@ebi.ac.uk> Message-ID: <1653CD3E-CEB7-437E-88CC-0F65C9BDA931@gmail.com> Are you using gff3 files as input? If so, could you send those to me? They are probably coming from thise. --carson Sent from my iPhone > On Apr 29, 2014, at 2:21 AM, Malcolm Hinsley wrote: > > Hi > > We've just reinstalled maker 2.31 using mpich3 (3.1) and are delighted that file locking and other issues have been resolved. (I'm running maker across several nodes on the compute farm). The maker code is identical: I took the previous tar.gz archive and made a clean build. > > Using a copy of a previous configuration to test, the only differences I can see is that the location of some files has changed (the working directory is on a different file system) and that I'm using a bigger (unfiltered) repeat library. > > The previous maker run produced 17393 genes and 17393 mRNAs, and this new version gives 15927 genes and 21328 mRNA. > > I have alt_splice=0: > > $ grep splice ../maker_opts.ctl > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > > > Any idea why I'm getting multiple mRNAs per gene? > > -- > malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 > European Bioinformatics Institute (EMBL-EBI) > European Molecular Biology Laboratory > Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD > United Kingdom > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Wed Apr 30 08:53:29 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 30 Apr 2014 14:53:29 +0000 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: <1398869131512.52399@uga.edu> References: <1398869131512.52399@uga.edu> Message-ID: From: Sivaranjani Namasivayam > Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" > Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 30 08:55:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Apr 2014 08:55:12 -0600 Subject: [maker-devel] FW: protein2genome gene models Message-ID: Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed Apr 30 17:25:17 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 30 Apr 2014 16:25:17 -0700 Subject: [maker-devel] est_forward and conflicting names Message-ID: Hi, Carson. I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m using with est and protein to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I?m using est_forward to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like orf1234 or other numeric IDs. When multiple evidence sequences map to the same location, how does est_forward choose which name to use? If it?s chosen arbitrarily, could it be possible to choose the most common name instead? Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.macmanes at unh.edu Tue Apr 1 05:23:59 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Tue, 1 Apr 2014 07:23:59 -0400 Subject: [maker-devel] Installing Maker on Cray Message-ID: Hello, I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. Any advice? Thanks, Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at icloud.com Tue Apr 1 06:58:35 2014 From: carson.holt at icloud.com (Carson Holt) Date: Tue, 01 Apr 2014 06:58:35 -0600 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: References: Message-ID: Create a soft link called mpicc. I can't guarantee shared libraries are installed on you system though as not all system derived versions of MPICH2 have been configured with shared libraries. --Carson Sent from my iPhone > On Apr 1, 2014, at 5:23 AM, Matthew MacManes wrote: > > Hello, > > I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page > > Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC > > When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. > > Any advice? > > Thanks, Matt > > __________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.macmanes at unh.edu Tue Apr 1 10:11:55 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Tue, 1 Apr 2014 12:11:55 -0400 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> References: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> Message-ID: Hi Carson and list: I tried that - we'll see if it works. I'm hung up on Perl dependencies right now - the Craycc compiler is not happy with several of them (forks, to name one). If anybody has installed Maker on a Cray, please contact me! Thanks, Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Tue, Apr 1, 2014 at 8:58 AM, Carson Holt wrote: > Create a soft link called mpicc. I can't guarantee shared libraries are > installed on you system though as not all system derived versions of MPICH2 > have been configured with shared libraries. > > --Carson > > > > Sent from my iPhone > > On Apr 1, 2014, at 5:23 AM, Matthew MacManes > wrote: > > Hello, > > I am trying to install the MPI version of Maker on our Cray > supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page > > Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's > own proprietary compilers mpicc=cc and mpicxx=CC > > When running the 1st step in src 'perl Build.pl', it asks me for the > location of mpicc - I can give the full path to Cray equivalent cc, but it > is not recognized. Many other programs allow me to specify the c compiler, > e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. > > Any advice? > > Thanks, Matt > > __________________________________ > *Matthew MacManes*, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Apr 1 10:29:40 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Apr 2014 16:29:40 +0000 Subject: [maker-devel] Installing Maker on Cray In-Reply-To: References: <08e81be4456d4f1e9256b28d8018b7e3@DRY.ad.unh.edu> Message-ID: <350474CE-B7EB-4EFF-9C8B-AD71FBB81CA3@illinois.edu> We might be interested in that ourselves at some point: https://bluewaters.ncsa.illinois.edu chris On Apr 1, 2014, at 11:11 AM, Matthew MacManes > wrote: Hi Carson and list: I tried that - we'll see if it works. I'm hung up on Perl dependencies right now - the Craycc compiler is not happy with several of them (forks, to name one). If anybody has installed Maker on a Cray, please contact me! Thanks, Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Tue, Apr 1, 2014 at 8:58 AM, Carson Holt > wrote: Create a soft link called mpicc. I can't guarantee shared libraries are installed on you system though as not all system derived versions of MPICH2 have been configured with shared libraries. --Carson Sent from my iPhone On Apr 1, 2014, at 5:23 AM, Matthew MacManes > wrote: Hello, I am trying to install the MPI version of Maker on our Cray supercomputer: http://trillian-use.sr.unh.edu/index.php/Main_Page Cray has MPICH2, but not the compilers mpicc and mpicxx. Cray has it's own proprietary compilers mpicc=cc and mpicxx=CC When running the 1st step in src 'perl Build.pl', it asks me for the location of mpicc - I can give the full path to Cray equivalent cc, but it is not recognized. Many other programs allow me to specify the c compiler, e.g, './configure mpicc=cc', but I cannot seem to do this with Maker. Any advice? Thanks, Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at bioperl.org Tue Apr 1 12:39:14 2014 From: jason at bioperl.org (Jason Stajich) Date: Tue, 1 Apr 2014 11:39:14 -0700 Subject: [maker-devel] maker to EvidenceModeler In-Reply-To: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> References: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com> Message-ID: I've used this script I wrote to make the necessary input files from maker GFF3. https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/maker2evm.pl Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Tue, Mar 25, 2014 at 9:33 AM, dhivya arasappan wrote: > Hi Carson and others, > > Is there an easy tool/pipeline available as part of maker utilities to > convert maker and SNAP output to files acceptable by EvidenceModeler? > > It looks like it also needs just gff files, but with a few tweaks. > EvidenceModeler seems better equipped to handle PASA annotation results > than maker results. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 1 12:36:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 01 Apr 2014 12:36:44 -0600 Subject: [maker-devel] Missing UTRs in GFF In-Reply-To: References: Message-ID: It was indeed caused by the correct_est_fusion=1 option (which is supposed to trim off UTR if it appears overlap of UTR across genes is caused by merged mRNAseq). I have attached a patch that is used to replace .../maker/lib/maker/auto_annotator.pm, and I've updated the website download to include the patch as in MAKER download version 2.31.3. Thanks, Carson From: Benjamin Rubin Date: Tuesday, April 1, 2014 at 9:21 AM To: Carson Holt Subject: Re: [maker-devel] Missing UTRs in GFF OK, I think I uploaded everything. I included a cleaned up version of the control file without all of my paths in case that is useful. Thanks, Ben On Tue, Apr 1, 2014 at 9:50 AM, Carson Holt wrote: > Could upload your input fasta and hmm files as well. Sometimes I can > reproduce errors using just the raw reports, but it looks like I will need the > input files. > > --Carson > > > From: Benjamin Rubin > Date: Tuesday, April 1, 2014 at 8:38 AM > To: Carson Holt > Subject: Re: [maker-devel] Missing UTRs in GFF > > Hi Carson, > > I tried using version 2.31 on a scaffold where this problem occurred with 2.30 > and got the same result, unfortunately. I did use corr_est_fusion=1 both times > so this might be related. I have uploaded the sequence for this scaffold and > the output directory under username "brubin". Is this the data that you meant? > > I am also reattaching information on a representative problem gene from this > scaffold that occurs at base 1330779. > > Thanks so much for the help, > Ben > > > On Mon, Mar 31, 2014 at 9:37 AM, Carson Holt wrote: >> Not something I've seen before, but there was a patch for another issue that >> was cause by the use of avoid_est_fusion=1, that may be related. Try the >> current stable release 2.31, and let me know if it still happens. >> >> You can also upload the contig folder from one of the regions in question >> here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> Then I could verify the bug, and see if it is something that happens in the >> current release. >> >> --Carson >> >> >> From: Benjamin Rubin >> Date: Saturday, March 29, 2014 at 10:24 AM >> To: >> Subject: [maker-devel] Missing UTRs in GFF >> >> I have annotated a eukaryotic genome with MAKER 2.30. I recently realized >> that there are a few genes in the GFF file produced by gff3_merge with >> inconsistencies in the annotated CDS and UTRs. For most of my genes, the UTRs >> have their own lines in the GFF file. However, for the problematic genes, the >> UTRs are not specified in the GFF file and all exons are annotated as CDS. >> The UTRs do appear in the gene header and the protein sequences are the >> correct length (do not include the UTR). I have attached an example from the >> GFF file. >> >> Is this a known problem, or have I done something wrong? Is there an easy way >> to fix the GFF file? >> >> Thanks for your help, >> Ben >> >> -- >> _____________________________________________________ >> Benjamin ER Rubin >> PhD Candidate >> Committee on Evolutionary Biology >> University of Chicago >> benrubin.org >> >> Division of Insects >> Zoology Department >> Field Museum of Natural History >> 1400 South Lake Shore Drive >> Chicago, IL 60605 >> USA >> Office: (312) 665-7776 >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > benrubin.org > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: auto_annotator.pm Type: text/x-perl-script Size: 101568 bytes Desc: not available URL: From amelia.ireland at gmod.org Thu Apr 3 15:10:53 2014 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Thu, 3 Apr 2014 14:10:53 -0700 Subject: [maker-devel] GMOD Online Training 2014 Message-ID: Greetings GMOD community! Applications are now open for the 2014 GMOD online training course, to be held from May 19th - 23rd 2014. The course will cover the installation, configuration, and usage of core GMOD software, including GBrowse and JBrowse, Galaxy, MAKER, Tripal, WebApollo, Canto, and the Chado database. The course is taught by experienced instructors and developers with deep knowledge of the tools. Although the course will be run online, students will be able to interact with the tutors and fellow attendees, ask questions, and so on. For more information and to apply, please see http://gmod.org/wiki/GMOD_Online_Training_2014 If you have any questions, please contact the GMOD help desk at help at gmod.org. Thanks! -- Amelia Ireland GMOD Community Support Generic Model Organism Database project http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From Brian.Mack at ARS.USDA.GOV Mon Apr 7 06:55:01 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Mon, 7 Apr 2014 12:55:01 +0000 Subject: [maker-devel] maker_functional_gff Message-ID: Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular "-outfmt 6" option against the uniprot uniref-50. I put these results in the maker_functional_gff program using "maker_functional_gff uniref-50 blastp-output maker.gff" but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I'm doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 7 08:58:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 08:58:20 -0600 Subject: [maker-devel] maker_functional_gff Message-ID: maker_functional_gff works with UniProt/Swiss-Prot. The uniref-50 headers are different. The script looks for the OS= GN= and PE= tags. You might be able to coerce it into working on the UniRef header by changing Tax= to OS=, RepID= to GN= and then adding a PE= to the end of the header as just a placeholder. --Carson From: "Mack, Brian" Date: Monday, April 7, 2014 at 6:55 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] maker_functional_gff Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular ?-outfmt 6? option against the uniprot uniref-50. I put these results in the maker_functional_gff program using ?maker_functional_gff uniref-50 blastp-output maker.gff? but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I?m doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 7 09:02:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 09:02:55 -0600 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: I added a line to look for the UniRef header format in the attached scripts. Go ahead and give it a try. --Carson From: Carson Holt Date: Monday, April 7, 2014 at 8:58 AM To: "Mack, Brian" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker_functional_gff maker_functional_gff works with UniProt/Swiss-Prot. The uniref-50 headers are different. The script looks for the OS= GN= and PE= tags. You might be able to coerce it into working on the UniRef header by changing Tax= to OS=, RepID= to GN= and then adding a PE= to the end of the header as just a placeholder. --Carson From: "Mack, Brian" Date: Monday, April 7, 2014 at 6:55 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] maker_functional_gff Hi, I am trying to use the maker_functional_gff program to add functional annotations to my maker gff file. I used blastp with the tabular ?-outfmt 6? option against the uniprot uniref-50. I put these results in the maker_functional_gff program using ?maker_functional_gff uniref-50 blastp-output maker.gff? but I get the following errors and no updating of the names in my maker gff file: Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924097. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924097. Can't parse details from FASTA header: >UniRef50_K1R9E3 Uncharacterized protein n=1 Tax=Crassostrea gigas RepID=K1R9E3_CRAGI Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 142, <$IN> line 16924128. Use of uninitialized value $id in hash element at /home/b/maker/bin/maker_functional_gff line 144, <$IN> line 16924128. Can't parse details from FASTA header: >UniRef50_K1R9E4 Transporter n=2 Tax=Mollusca RepID=K1R9E4_CRAGI Any ideas of what I?m doing wrong? Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_functional_fasta Type: application/octet-stream Size: 3452 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_functional_gff Type: application/octet-stream Size: 4103 bytes Desc: not available URL: From darasappan at gmail.com Mon Apr 7 09:57:08 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 7 Apr 2014 10:57:08 -0500 Subject: [maker-devel] keep_preds parameter Message-ID: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Hello, I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? Thanks dhivya From dence at genetics.utah.edu Mon Apr 7 10:06:15 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 7 Apr 2014 16:06:15 +0000 Subject: [maker-devel] keep_preds parameter In-Reply-To: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: Hi Dhivya, That's a correct understanding of keep_preds, and it is a binary parameter; you either tell MAKER to keep the unsupported predictions or not to keep the unsupported predictions. In the output, you can tell which genes are supported by the _AED attribute in the gff3 file. Genes with and AED equal to zero have no support from the evidence sets (protein and EST and alt_EST). ~Daniel On Apr 7, 2014, at 9:57 AM, dhivya arasappan wrote: > Hello, > > I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? > > Thanks > dhivya > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From darasappan at gmail.com Mon Apr 7 10:31:55 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 7 Apr 2014 11:31:55 -0500 Subject: [maker-devel] keep_preds parameter In-Reply-To: References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: Thank you Daniel. But I thought an AED score of zero indicates complete agreement of annotation to evidence and that 1 would mean no agreement? Dhivya On Apr 7, 2014, at 11:06 AM, Daniel Ence wrote: > Hi Dhivya, > > That's a correct understanding of keep_preds, and it is a binary parameter; you either tell MAKER to keep the unsupported predictions or not to keep the unsupported predictions. In the output, you can tell which genes are supported by the _AED attribute in the gff3 file. Genes with and AED equal to zero have no support from the evidence sets (protein and EST and alt_EST). > > ~Daniel > On Apr 7, 2014, at 9:57 AM, dhivya arasappan > wrote: > >> Hello, >> >> I?m looking for a little more explanation about keep_preds parameter. The documentation says that it is a threshold to add unsupported gene predictions. Along with some other changes, I set keep_preds=1 and saw a huge jump in the number of genes I was getting. Is setting this parameter to 1 equivalent to saying, include all predicted genes in my output, even if they are not supported by my set or protein data? Is there a way to tell from my output which genes are unsupported and which are not? Also, are the only two options for this parameter 0 and 1? >> >> Thanks >> dhivya >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Mon Apr 7 10:33:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 10:33:59 -0600 Subject: [maker-devel] keep_preds parameter In-Reply-To: References: <78522D2B-CDE0-4CBF-83A5-DC1FB255D3E8@gmail.com> Message-ID: True. Daniel had the numbers backwards (I often accidentally do that as well). --Carson On 4/7/14, 10:31 AM, "dhivya arasappan" wrote: >Thank you Daniel. But I thought an AED score of zero indicates complete >agreement of annotation to evidence and that 1 would mean no agreement? > >Dhivya > >On Apr 7, 2014, at 11:06 AM, Daniel Ence wrote: > >> Hi Dhivya, >> >> That's a correct understanding of keep_preds, and it is a binary >>parameter; you either tell MAKER to keep the unsupported predictions or >>not to keep the unsupported predictions. In the output, you can tell >>which genes are supported by the _AED attribute in the gff3 file. Genes >>with and AED equal to zero have no support from the evidence sets >>(protein and EST and alt_EST). >> >> ~Daniel >> On Apr 7, 2014, at 9:57 AM, dhivya arasappan >> wrote: >> >>> Hello, >>> >>> I?m looking for a little more explanation about keep_preds parameter. >>>The documentation says that it is a threshold to add unsupported gene >>>predictions. Along with some other changes, I set keep_preds=1 and saw >>>a huge jump in the number of genes I was getting. Is setting this >>>parameter to 1 equivalent to saying, include all predicted genes in my >>>output, even if they are not supported by my set or protein data? Is >>>there a way to tell from my output which genes are unsupported and >>>which are not? Also, are the only two options for this parameter 0 and >>>1? >>> >>> Thanks >>> dhivya >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nextgen.usfs at gmail.com Mon Apr 7 16:34:32 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Mon, 7 Apr 2014 17:34:32 -0500 Subject: [maker-devel] fasta_merge ARRAY error Message-ID: Hello, I?m getting an error when running fasta_merge as follows: Can't use an undefined value as an ARRAY reference at /home/ngs/maker/bin/fasta_merge line 116, line 1942. The result is that the fasta files are somewhat truncated, that is they do not match the gff3 file created from gff3_merge (which does run without any errors). Seems like it is getting stuck somewhere and then crashes. Is there another way to easily get the CDS out of the maker generated GFF file? Thanks, Jon From dence at genetics.utah.edu Mon Apr 7 19:23:07 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 8 Apr 2014 01:23:07 +0000 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: Hi Jon, Will you please send the command that gave you that error? Also, will you upload the maker control files you used and the gff3 file to the URL below? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 Also, which version of MAKER are you using? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS Ion PGM [nextgen.usfs at gmail.com] Sent: Monday, April 07, 2014 4:34 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] fasta_merge ARRAY error Hello, I?m getting an error when running fasta_merge as follows: Can't use an undefined value as an ARRAY reference at /home/ngs/maker/bin/fasta_merge line 116, line 1942. The result is that the fasta files are somewhat truncated, that is they do not match the gff3 file created from gff3_merge (which does run without any errors). Seems like it is getting stuck somewhere and then crashes. Is there another way to easily get the CDS out of the maker generated GFF file? Thanks, Jon _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Apr 7 20:02:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 07 Apr 2014 20:02:30 -0600 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: What version of MAKER are you using, and did you run with the new trnascan option turned on? Basically the script is finding a fasta file for transcripts but the file for proteins is missing. Turning trnascan on can do this (obviously tRNAs can encode transcripts but don't encode proteins). The version of fasta_merge included in the current MAKER 2.31.3 download should handle this correctly. --Carson On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >Hi Jon, Will you please send the command that gave you that error? Also, >will you upload the maker control files you used and the gff3 file to the >URL below? > >http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 > >Also, which version of MAKER are you using? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS >Ion PGM [nextgen.usfs at gmail.com] >Sent: Monday, April 07, 2014 4:34 PM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] fasta_merge ARRAY error > >Hello, > >I?m getting an error when running fasta_merge as follows: > >Can't use an undefined value as an ARRAY reference at >/home/ngs/maker/bin/fasta_merge line 116, line 1942. > >The result is that the fasta files are somewhat truncated, that is they >do not match the gff3 file created from gff3_merge (which does run >without any errors). Seems like it is getting stuck somewhere and then >crashes. Is there another way to easily get the CDS out of the maker >generated GFF file? > >Thanks, > >Jon > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nextgen.usfs at gmail.com Tue Apr 8 06:56:22 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Tue, 8 Apr 2014 07:56:22 -0500 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: Message-ID: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Hi Carson and Daniel, I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so perhaps I should just get fasta_merge from 2.31.3 and give it a shot. But first to clarify, fasta_merge -d maker1_master_datastore_index.log - returns the appropriate files, however both the maker.all.proteins.fasta and maker.all.transcripts.fasta return 7401 with a grep command counting ?>?, while the gff3_merge -d maker1_master_datastore_index.log runs without failure and a grep command counting ?gene? returns 7525 models. I uploaded the files requested below. Thanks for the help. -Jon On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: > What version of MAKER are you using, and did you run with the new trnascan > option turned on? Basically the script is finding a fasta file for > transcripts but the file for proteins is missing. Turning trnascan on can > do this (obviously tRNAs can encode transcripts but don't encode > proteins). The version of fasta_merge included in the current MAKER > 2.31.3 download should handle this correctly. > > --Carson > > > > On 4/7/14, 7:23 PM, "Daniel Ence" wrote: > >> Hi Jon, Will you please send the command that gave you that error? Also, >> will you upload the maker control files you used and the gff3 file to the >> URL below? >> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >> >> Also, which version of MAKER are you using? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of USFS >> Ion PGM [nextgen.usfs at gmail.com] >> Sent: Monday, April 07, 2014 4:34 PM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] fasta_merge ARRAY error >> >> Hello, >> >> I?m getting an error when running fasta_merge as follows: >> >> Can't use an undefined value as an ARRAY reference at >> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >> >> The result is that the fasta files are somewhat truncated, that is they >> do not match the gff3 file created from gff3_merge (which does run >> without any errors). Seems like it is getting stuck somewhere and then >> crashes. Is there another way to easily get the CDS out of the maker >> generated GFF file? >> >> Thanks, >> >> Jon >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Tue Apr 8 08:54:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 08 Apr 2014 08:54:05 -0600 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> References: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Message-ID: I've attached the fixed version (I see that the patched one is not in 2.31.3, but I'll get that taken care of). The tRNA genes will be in the maker.trnascan.transcripts.fasta. The other files will have only the coding genes. --Carson On 4/8/14, 6:56 AM, "USFS Ion PGM" wrote: >Hi Carson and Daniel, >I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so >perhaps I should just get fasta_merge from 2.31.3 and give it a shot. >But first to clarify, fasta_merge -d maker1_master_datastore_index.log - >returns the appropriate files, however both the maker.all.proteins.fasta >and maker.all.transcripts.fasta return 7401 with a grep command counting >?>?, while the gff3_merge -d maker1_master_datastore_index.log runs >without failure and a grep command counting ?gene? returns 7525 models. > >I uploaded the files requested below. Thanks for the help. > >-Jon > > >On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: > >> What version of MAKER are you using, and did you run with the new >>trnascan >> option turned on? Basically the script is finding a fasta file for >> transcripts but the file for proteins is missing. Turning trnascan on >>can >> do this (obviously tRNAs can encode transcripts but don't encode >> proteins). The version of fasta_merge included in the current MAKER >> 2.31.3 download should handle this correctly. >> >> --Carson >> >> >> >> On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >> >>> Hi Jon, Will you please send the command that gave you that error? >>>Also, >>> will you upload the maker control files you used and the gff3 file to >>>the >>> URL below? >>> >>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >>> >>> Also, which version of MAKER are you using? >>> >>> Thanks, >>> Daniel >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>USFS >>> Ion PGM [nextgen.usfs at gmail.com] >>> Sent: Monday, April 07, 2014 4:34 PM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] fasta_merge ARRAY error >>> >>> Hello, >>> >>> I?m getting an error when running fasta_merge as follows: >>> >>> Can't use an undefined value as an ARRAY reference at >>> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >>> >>> The result is that the fasta files are somewhat truncated, that is they >>> do not match the gff3 file created from gff3_merge (which does run >>> without any errors). Seems like it is getting stuck somewhere and then >>> crashes. Is there another way to easily get the CDS out of the maker >>> generated GFF file? >>> >>> Thanks, >>> >>> Jon >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: fasta_merge Type: application/octet-stream Size: 2978 bytes Desc: not available URL: From nextgen.usfs at gmail.com Tue Apr 8 10:01:18 2014 From: nextgen.usfs at gmail.com (Jon Palmer) Date: Tue, 08 Apr 2014 11:01:18 -0500 Subject: [maker-devel] fasta_merge ARRAY error In-Reply-To: References: <90D87B84-7247-4E37-ABA3-FB127704F684@gmail.com> Message-ID: <53441D4E.2070502@gmail.com> Thanks Carson, error is gone and is now working. Thanks for a great tool and for the fantastic support! -Jon On 04/08/2014 09:54 AM, Carson Holt wrote: > I've attached the fixed version (I see that the patched one is not in > 2.31.3, but I'll get that taken care of). > > The tRNA genes will be in the maker.trnascan.transcripts.fasta. The other > files will have only the coding genes. > > --Carson > > > > On 4/8/14, 6:56 AM, "USFS Ion PGM" wrote: > >> Hi Carson and Daniel, >> I?m running Maker 2.31.2 and yes I did have tRNAscan turned on - so >> perhaps I should just get fasta_merge from 2.31.3 and give it a shot. >> But first to clarify, fasta_merge -d maker1_master_datastore_index.log - >> returns the appropriate files, however both the maker.all.proteins.fasta >> and maker.all.transcripts.fasta return 7401 with a grep command counting >> ?>?, while the gff3_merge -d maker1_master_datastore_index.log runs >> without failure and a grep command counting ?gene? returns 7525 models. >> >> I uploaded the files requested below. Thanks for the help. >> >> -Jon >> >> >> On Apr 7, 2014, at 9:02 PM, Carson Holt wrote: >> >>> What version of MAKER are you using, and did you run with the new >>> trnascan >>> option turned on? Basically the script is finding a fasta file for >>> transcripts but the file for proteins is missing. Turning trnascan on >>> can >>> do this (obviously tRNAs can encode transcripts but don't encode >>> proteins). The version of fasta_merge included in the current MAKER >>> 2.31.3 download should handle this correctly. >>> >>> --Carson >>> >>> >>> >>> On 4/7/14, 7:23 PM, "Daniel Ence" wrote: >>> >>>> Hi Jon, Will you please send the command that gave you that error? >>>> Also, >>>> will you upload the maker control files you used and the gff3 file to >>>> the >>>> URL below? >>>> >>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=360 >>>> >>>> Also, which version of MAKER are you using? >>>> >>>> Thanks, >>>> Daniel >>>> >>>> >>>> Daniel Ence >>>> Graduate Student >>>> Eccles Institute of Human Genetics >>>> University of Utah >>>> 15 North 2030 East, Room 2100 >>>> Salt Lake City, UT 84112-5330 >>>> ________________________________________ >>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>> USFS >>>> Ion PGM [nextgen.usfs at gmail.com] >>>> Sent: Monday, April 07, 2014 4:34 PM >>>> To: maker-devel at yandell-lab.org >>>> Subject: [maker-devel] fasta_merge ARRAY error >>>> >>>> Hello, >>>> >>>> I?m getting an error when running fasta_merge as follows: >>>> >>>> Can't use an undefined value as an ARRAY reference at >>>> /home/ngs/maker/bin/fasta_merge line 116, line 1942. >>>> >>>> The result is that the fasta files are somewhat truncated, that is they >>>> do not match the gff3 file created from gff3_merge (which does run >>>> without any errors). Seems like it is getting stuck somewhere and then >>>> crashes. Is there another way to easily get the CDS out of the maker >>>> generated GFF file? >>>> >>>> Thanks, >>>> >>>> Jon >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> From sjackman at gmail.com Tue Apr 8 13:21:38 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 8 Apr 2014 12:21:38 -0700 Subject: [maker-devel] Changing rmlib runs RepeatRunner Message-ID: Changing `rmlib` causes not just RepeatMasker to be rerun, but also RepeatRunner. Is the latter necessary? Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 8 14:00:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 08 Apr 2014 14:00:11 -0600 Subject: [maker-devel] Changing rmlib runs RepeatRunner In-Reply-To: References: Message-ID: RepeatRunner runs on what was not masked by RepeatMasker, so changing rmlib can cause RepeatRunner to give slightly different results because RepeatMasker results changed. --Carson From: Shaun Jackman Date: Tuesday, April 8, 2014 at 1:21 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Changing rmlib runs RepeatRunner Changing `rmlib` causes not just RepeatMasker to be rerun, but also RepeatRunner. Is the latter necessary? Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 12:34:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 11:34:34 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER Message-ID: The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA and CDS features, but it does not include gene or exon features, and so it does not appear to be working with MAKER. Has anyone else used GlimmerHMM with MAKER, and how did you deal with this issue? Cheers, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 12:53:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 12:53:55 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match for mRNA and match_part for CDS. Then it will be interpreted as a two level alignments feature which can be given to pred_gff. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 12:34 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Using GlimmerHMM with MAKER The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA and CDS features, but it does not include gene or exon features, and so it does not appear to be working with MAKER. Has anyone else used GlimmerHMM with MAKER, and how did you deal with this issue? Cheers, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 15:32:55 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 14:32:55 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: Thanks, Carson. That helps. I'm trying to do a completely ab initio gene annotation without any est or protein homology evidence, at least for now. The GFF file produce by maker is empty. How do I carry the GlimmerHMM pred_gff (or model_gff) annotations through to the end? Ultimately, I'd like to merge annotations from multiple ab initio predictions. Cheers, Shaun On 10 April 2014 11:53, Carson Holt wrote: > Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match > for mRNA and match_part for CDS. Then it will be interpreted as a two > level alignments feature which can be given to pred_gff. > > --Carson > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 12:34 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Using GlimmerHMM with MAKER > > The GlimmerHMM gene prediction software outputs a GFF file that includes > mRNA and CDS features, but it does not include gene or exon features, and > so it does not appear to be working with MAKER. Has anyone else used > GlimmerHMM with MAKER, and how did you deal with this issue? > > Cheers, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 15:35:17 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 15:35:17 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: keep_preds=1 will force MAKER to keep ab initio results even if their is no evidence supporting them. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 3:32 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Using GlimmerHMM with MAKER Thanks, Carson. That helps. I'm trying to do a completely ab initio gene annotation without any est or protein homology evidence, at least for now. The GFF file produce by maker is empty. How do I carry the GlimmerHMM pred_gff (or model_gff) annotations through to the end? Ultimately, I'd like to merge annotations from multiple ab initio predictions. Cheers, Shaun On 10 April 2014 11:53, Carson Holt wrote: > Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute match for > mRNA and match_part for CDS. Then it will be interpreted as a two level > alignments feature which can be given to pred_gff. > > --Carson > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 12:34 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Using GlimmerHMM with MAKER > > The GlimmerHMM gene prediction software outputs a GFF file that includes mRNA > and CDS features, but it does not include gene or exon features, and so it > does not appear to be working with MAKER. Has anyone else used GlimmerHMM with > MAKER, and how did you deal with this issue? > > Cheers, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Apr 10 16:51:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 10 Apr 2014 15:51:34 -0700 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: That worked! Thanks again, Carson. A note for the record: I found that keep_preds=1 carries forward pred_gffannotations, but not model_gff annotations when that GFF file uses match and match_partannotations (like a munged GlimmerHMM GFF file), which makes sense I guess now that I think about it. Cheers, Shaun On 10 April 2014 14:35, Carson Holt wrote: > keep_preds=1 will force MAKER to keep ab initio results even if their is > no evidence supporting them. > > --Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, April 10, 2014 at 3:32 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Using GlimmerHMM with MAKER > > Thanks, Carson. That helps. I'm trying to do a completely ab initio gene > annotation without any est or protein homology evidence, at least for now. > The GFF file produce by maker is empty. How do I carry the GlimmerHMM > pred_gff (or model_gff) annotations through to the end? Ultimately, I'd > like to merge annotations from multiple ab initio predictions. > > Cheers, > Shaun > > > On 10 April 2014 11:53, Carson Holt wrote: > >> Make sure it's not GTF or GFF2, but if it is GFF3 You can substitute >> match for mRNA and match_part for CDS. Then it will be interpreted as a >> two level alignments feature which can be given to pred_gff. >> >> --Carson >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Thursday, April 10, 2014 at 12:34 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] Using GlimmerHMM with MAKER >> >> The GlimmerHMM gene prediction software outputs a GFF file that includes >> mRNA and CDS features, but it does not include gene or exon features, and >> so it does not appear to be working with MAKER. Has anyone else used >> GlimmerHMM with MAKER, and how did you deal with this issue? >> >> Cheers, >> Shaun >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 10 16:55:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Apr 2014 16:55:07 -0600 Subject: [maker-devel] Using GlimmerHMM with MAKER In-Reply-To: References: Message-ID: The model_gff option can only take gene/mRNA/exon/CDS features, and will ignore match/match_part features. It's a little more restrictive. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, April 10, 2014 at 4:51 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Using GlimmerHMM with MAKER model_gff -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbharris at uw.edu Mon Apr 14 19:45:13 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Mon, 14 Apr 2014 18:45:13 -0700 Subject: [maker-devel] empty genome.ann/genome.dna Message-ID: Hi, I recently set up MAKER on a new computer and am having trouble running a dataset that was run successfully on a different computer. After MAKER is finished, I ran gff3_merge and maker2zff and it returns empty genome.ann and genome.dna files. I have tried installing older versions of dependencies and have tinkered with the control files but I still can't figure out what the issue is. The only difference I can find is that the .all.gff file from a successfully run file has lines at the beginning of the file reporting the success of exonerate. On the failing version of maker - these are not reported - it just goes strait to fasta output. However, exonerate appears to work successfully when run outside of the maker pipeline. Any help would be greatly appreciated. Thanks! Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 15 09:33:45 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Apr 2014 09:33:45 -0600 Subject: [maker-devel] empty genome.ann/genome.dna In-Reply-To: References: Message-ID: Could you upload your control files and job input files here--> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi I'll take a look to see if there is any problem with your job's setup. Also what version of MAKER are you running? --Carson From: Rebecca Harris Date: Monday, April 14, 2014 at 7:45 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] empty genome.ann/genome.dna Hi, I recently set up MAKER on a new computer and am having trouble running a dataset that was run successfully on a different computer. After MAKER is finished, I ran gff3_merge and maker2zff and it returns empty genome.ann and genome.dna files. I have tried installing older versions of dependencies and have tinkered with the control files but I still can't figure out what the issue is. The only difference I can find is that the .all.gff file from a successfully run file has lines at the beginning of the file reporting the success of exonerate. On the failing version of maker - these are not reported - it just goes strait to fasta output. However, exonerate appears to work successfully when run outside of the maker pipeline. Any help would be greatly appreciated. Thanks! Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Tue Apr 15 11:01:37 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Tue, 15 Apr 2014 13:01:37 -0400 Subject: [maker-devel] passing names from a gff to new predictions Message-ID: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Hello I have an interesting issue with an existing Maker gff. I have a gff file with human friendly names that I would like to pass to the new predictions. However, some of those genes in the human friendly gff file are incorrect or have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 the names move but so do the incorrect models. Maker simply duplicates these predictions to the new outputs. If I remove the GFF file from the ctl file I get new predictions, that have the necessary corrections but they now have unfriendly names. Do you have any suggestions on how to associate the old names with the new predictions? I could simple blast the old proteins vs the new ones and associate them in that manor but I was wondering if there were any other options within Maker. Since I have the GFF files I also have the associated transcripts and proteins. Do I need to do some iteration of est2/genome then generate a new model gff file? The issue we are dealing with is thousands of short introns in our gff file. These are less than 20 bp and are not biologically feasible so we are trying to correct the gene model predictions. Cheers Ian From carsonhh at gmail.com Tue Apr 15 11:31:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Apr 2014 11:31:35 -0600 Subject: [maker-devel] passing names from a gff to new predictions In-Reply-To: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> References: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Message-ID: If you give anything to pred_gff or model_gff then it is allowed to compete as a predictor and thus can end up in the final results. You stated that the models you are passing in have errors, and you don't want them to be allowed to compete and end up in your final models? Correct. MAKER is not made to expect erroneous input, so I don't have an easy solution for you (I do have a less easy solution though; but you will need to do some editing of the MAKER code). 1. Open .../maker/lib/maker/auto_annotator.pm in an editor like emacs or vi. 2. Search for the 'best_annotations' subroutine (around line 1248 depending on which version of MAKER you have). 3. Then edit it as follows: This is how the top section of the subroutine should look at first --> sub best_annotations { my $annotations = shift; my $CTL_OPT = shift; my @predictors = @{$CTL_OPT->{_predictor}}; ... Change it to this --> sub best_annotations { my $annotations = shift; my $CTL_OPT = shift; my @predictors = grep {!/model_gff/} @{$CTL_OPT->{_predictor}}; ... Now run maker again with your old GFF3 file as input to model_gff, and just remember to change the MAKER code back to the way it was when your done with everything. Basically the change will hard filter model_gff results from being allowed into your final annotations. So names will still move from model_gff to your final results with the map_forward=1 option but none of the old models will make it as gene/mRNA/exon/CDS features in the final GFF3 (they will still be listed as match/match_part reference features though). Thanks, Carson On 4/15/14, 11:01 AM, "UMD Bioinformatics" wrote: > Hello > > I have an interesting issue with an existing Maker gff. I have a gff file with > human friendly names that I would like to pass to the new predictions. > However, some of those genes in the human friendly gff file are incorrect or > have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 > the names move but so do the incorrect models. Maker simply duplicates these > predictions to the new outputs. If I remove the GFF file from the ctl file I > get new predictions, that have the necessary corrections but they now have > unfriendly names. Do you have any suggestions on how to associate the old > names with the new predictions? I could simple blast the old proteins vs the > new ones and associate them in that manor but I was wondering if there were > any other options within Maker. > > Since I have the GFF files I also have the associated transcripts and > proteins. > Do I need to do some iteration of est2/genome then generate a new model gff > file? > > The issue we are dealing with is thousands of short introns in our gff file. > These are less than 20 bp and are not biologically feasible so we are trying > to correct the gene model predictions. > > Cheers > Ian > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Tue Apr 15 11:54:00 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Tue, 15 Apr 2014 13:54:00 -0400 Subject: [maker-devel] passing names from a gff to new predictions In-Reply-To: References: <3802A5F7-A673-4062-BDCD-4640E93EA54F@gmail.com> Message-ID: <31BC21FD-D9D6-4B66-B0D7-C48FBC3B7A98@gmail.com> Carson, That seems to fix this issue. Thanks for the insight not something I would have ever come up with. Cheers Ian On Apr 15, 2014, at 1:31 PM, Carson Holt wrote: > If you give anything to pred_gff or model_gff then it is allowed to compete as a predictor and thus can end up in the final results. You stated that the models you are passing in have errors, and you don't want them to be allowed to compete and end up in your final models? Correct. > > MAKER is not made to expect erroneous input, so I don't have an easy solution for you (I do have a less easy solution though; but you will need to do some editing of the MAKER code). > > Open .../maker/lib/maker/auto_annotator.pm in an editor like emacs or vi. > Search for the 'best_annotations' subroutine (around line 1248 depending on which version of MAKER you have). > Then edit it as follows: > > This is how the top section of the subroutine should look at first --> > > sub best_annotations { > my $annotations = shift; > my $CTL_OPT = shift; > > my @predictors = @{$CTL_OPT->{_predictor}}; > > ... > > Change it to this --> > > sub best_annotations { > my $annotations = shift; > my $CTL_OPT = shift; > > my @predictors = grep {!/model_gff/} @{$CTL_OPT->{_predictor}}; > > ... > > > > Now run maker again with your old GFF3 file as input to model_gff, and just remember to change the MAKER code back to the way it was when your done with everything. Basically the change will hard filter model_gff results from being allowed into your final annotations. So names will still move from model_gff to your final results with the map_forward=1 option but none of the old models will make it as gene/mRNA/exon/CDS features in the final GFF3 (they will still be listed as match/match_part reference features though). > > Thanks, > Carson > > > > On 4/15/14, 11:01 AM, "UMD Bioinformatics" wrote: > >> Hello >> >> I have an interesting issue with an existing Maker gff. I have a gff file with human friendly names that I would like to pass to the new predictions. However, some of those genes in the human friendly gff file are incorrect or have errors. If I use the gff as model_gff or pred_gff with the map_forward=1 the names move but so do the incorrect models. Maker simply duplicates these predictions to the new outputs. If I remove the GFF file from the ctl file I get new predictions, that have the necessary corrections but they now have unfriendly names. Do you have any suggestions on how to associate the old names with the new predictions? I could simple blast the old proteins vs the new ones and associate them in that manor but I was wondering if there were any other options within Maker. >> >> Since I have the GFF files I also have the associated transcripts and proteins. >> Do I need to do some iteration of est2/genome then generate a new model gff file? >> >> The issue we are dealing with is thousands of short introns in our gff file. These are less than 20 bp and are not biologically feasible so we are trying to correct the gene model predictions. >> >> Cheers >> Ian >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.king at rothamsted.ac.uk Wed Apr 16 05:27:09 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Wed, 16 Apr 2014 11:27:09 +0000 Subject: [maker-devel] scalar text in maker transcripts Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7C8DAC@rothex1.rothamsted.ac.uk> Hi, I've got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 16 15:56:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Apr 2014 15:56:25 -0600 Subject: [maker-devel] scalar text in maker transcripts Message-ID: The only time I have seen this is when fgenesh is used as a predictor and correct_est_fusion=1 is set (it was a bug in trimming long UTR's on fgenesh models). Is that how you have your job configured? If so, that particular bug is fixed in the current MAKER release. Thanks, Carson From: "Robert King (RRes-Roth)" Date: Wednesday, April 16, 2014 at 5:27 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] scalar text in maker transcripts Hi, I?ve got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner , and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.king at rothamsted.ac.uk Wed Apr 16 15:57:44 2014 From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth)) Date: Wed, 16 Apr 2014 21:57:44 +0000 Subject: [maker-devel] scalar text in maker transcripts In-Reply-To: <26314411-75c8-484f-9fbf-413e37d1c706@ROTHEX1.rothamsted.ac.uk> References: <26314411-75c8-484f-9fbf-413e37d1c706@ROTHEX1.rothamsted.ac.uk> Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7C8E85@rothex1.rothamsted.ac.uk> Yep I am. I?ll try upgrading. Thanks Rob From: Carson Holt [mailto:carsonhh at gmail.com] Sent: 16 April 2014 22:56 To: Robert King (RRes-Roth); maker-devel at yandell-lab.org Subject: Re: [maker-devel] scalar text in maker transcripts The only time I have seen this is when fgenesh is used as a predictor and correct_est_fusion=1 is set (it was a bug in trimming long UTR's on fgenesh models). Is that how you have your job configured? If so, that particular bug is fixed in the current MAKER release. Thanks, Carson From: "Robert King (RRes-Roth)" > Date: Wednesday, April 16, 2014 at 5:27 AM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] scalar text in maker transcripts Hi, I?ve got some strange characters in my maker transcripts (I used keep predictions). I opened the file in wordpad ACTTCGACATTCTCCGTCACCAATTCAATCACCCCACACGAACAACCATCGGAGCCTCCC AGAACTCGCATTACCGACTTCAAGATGTCSCALAR(0xf5397d8)SCALAR(0xc4cad 88)CTTCTTTCTACGGCGCTGGCCGCAAGGTCCTCGGCTACAACTCTTACTTCGGAAACT Any ideas what may cause this? Thanks Rob -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. -------------- next part -------------- An HTML attachment was scrubbed... URL: From muriel.grosb at gmail.com Mon Apr 7 06:29:42 2014 From: muriel.grosb at gmail.com (Muriel Gros-Balthazard) Date: Mon, 7 Apr 2014 14:29:42 +0200 Subject: [maker-devel] Help for Repeat Library Construction Message-ID: <474C2DF8-B5DF-424B-BCF7-EC64BC23EEDC@gmail.com> Hello, I am working on the annotation of the date palm genome using the MAKER pipeline. I started by following the manual for Repeat Library Construction - Advanced. I am stuck in 2.1.3. Indeed, I should use muscle to filter. But I don?t understand what is the file flankingseqfile. How can I obtain it ? Also, do you hava more information about 2.1.4 and 2.1.5 ? Thanks a lot for this great pipeline and for your help, Muriel Gros-Balthazard From Brian.Mack at ARS.USDA.GOV Thu Apr 17 14:34:21 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Thu, 17 Apr 2014 20:34:21 +0000 Subject: [maker-devel] tbl2asn errors Message-ID: Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 - 441931, but the sqn file will have 440868 - 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 17 14:59:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Apr 2014 14:59:05 -0600 Subject: [maker-devel] tbl2asn errors Message-ID: The only one that may be a real error is the first one (I'm not sure what it means). You probably need to find them and open them in a viewer like apollo. The rest I would consider warnings (the NCBI tool doesn't like any weirdness or uncertainty). You often have to manually edit things to get NCBI to accept all models without complaining (sometimes even going against real biology). I know some groups use the always_complete=1 option in MAKER to force start and stop codons into every model for example (even though those forced codons are probably false). *Not sure about this one --> 4 ERROR: SEQ_FEAT.BadTrailingCharacter *These are partial genes with no stop (usually happen at the edge of contigs or near strings of NNNN) --> 217 ERROR: SEQ_FEAT.NoStop *These are just short introns (intron size is under control of the ab initio predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron *These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR: SEQ_FEAT.StartCodon *These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR: SEQ_INST.BadProteinStart *Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) --> 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor *Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) --> 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor *These are just short exons (exon size is under control of the ab initio predictors) --> 118 WARNING: SEQ_FEAT.ShortExon You probably need to identify examples of models causing each issue, and then look at the in Apollo. Apollo lets you open tbl format and save back to it. I imagine the coordinate change is from NCBI using a 0 based coordinate system as opposed to a 1 based system (I.e. first base is 0 rather than 1). Unfortunately getting everything to go into NCBI is usually a grueling task. --Carson From: "Mack, Brian" Date: Thursday, April 17, 2014 at 2:34 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] tbl2asn errors Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475 ). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Geib at ARS.USDA.GOV Thu Apr 17 14:59:22 2014 From: Scott.Geib at ARS.USDA.GOV (Geib, Scott) Date: Thu, 17 Apr 2014 20:59:22 +0000 Subject: [maker-devel] tbl2asn errors In-Reply-To: References: Message-ID: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian > Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " > Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 17 15:27:53 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Apr 2014 15:27:53 -0600 Subject: [maker-devel] tbl2asn errors In-Reply-To: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> References: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Message-ID: Very cool. I'll try it out as well. --Carson From: "Geib, Scott" Date: Thursday, April 17, 2014 at 2:59 PM To: "Mack, Brian" , "maker-devel at yandell-lab.org" , "Brian Hall (bhall7 at hawaii.edu)" Subject: Re: [maker-devel] tbl2asn errors Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475 ). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Geib at ARS.USDA.GOV Thu Apr 17 16:37:49 2014 From: Scott.Geib at ARS.USDA.GOV (Geib, Scott) Date: Thu, 17 Apr 2014 22:37:49 +0000 Subject: [maker-devel] tbl2asn errors In-Reply-To: References: <0D54878997A4B9478F03938D61DB51D4266B6B@001FSN2MPN1-015.001f.mgd2.msft.net> Message-ID: <0D54878997A4B9478F03938D61DB51D4266C1E@001FSN2MPN1-015.001f.mgd2.msft.net> Just so not to be discouraged, current version has limited functionality and is pretty much un-documented (although will write a .tbl file). Will email the list when first real release is complete and documented. Scott From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, April 17, 2014 11:28 AM To: Geib, Scott; Mack, Brian; maker-devel at yandell-lab.org; Brian Hall (bhall7 at hawaii.edu) Subject: Re: [maker-devel] tbl2asn errors Very cool. I'll try it out as well. --Carson From: "Geib, Scott" > Date: Thursday, April 17, 2014 at 2:59 PM To: "Mack, Brian" >, "maker-devel at yandell-lab.org" >, "Brian Hall (bhall7 at hawaii.edu)" > Subject: Re: [maker-devel] tbl2asn errors Hi Brian, We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. http://brianreallymany.github.io/GAG/ It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks). It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA). In a release planned for then end of the month, you will be able to perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made. Hopefully this is helpful to you. Scott ---------- Forwarded message ---------- From: Mack, Brian > Date: Thu, Apr 17, 2014 at 10:34 AM Subject: [maker-devel] tbl2asn errors To: " " > Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance. In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file: 4 ERROR: SEQ_FEAT.BadTrailingCharacter 217 ERROR: SEQ_FEAT.NoStop 438 ERROR: SEQ_FEAT.ShortIntron 171 ERROR: SEQ_FEAT.StartCodon 171 ERROR: SEQ_INST.BadProteinStart 291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor 648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor 118 WARNING: SEQ_FEAT.ShortExon In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 ? 441931, but the sqn file will have 440868 ? 441930. Any ideas what might be causing this? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Fri Apr 18 07:14:45 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Fri, 18 Apr 2014 09:14:45 -0400 Subject: [maker-devel] Short Introns Message-ID: Hello, We are preparing two submission for NCBI, nightmare. However some of our MAKER gene models have short introns that are being flagged by NCBI. In one species we have >400 introns smaller then 20bp which is almost biologically impossible. I know we can set max intron length in the opts.ctl file but can we set a minimum intron length? I saw yesterdays posts that mention this is a result of the external ab initio predictors but I didn?t see an indication as to which predictor and how to change that setting. from yesterday: *These are just short introns (intron size is under control of the ab initio predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron Cheers Ian From carsonhh at gmail.com Fri Apr 18 09:35:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 18 Apr 2014 09:35:51 -0600 Subject: [maker-devel] Short Introns In-Reply-To: References: Message-ID: Look at the name of those genes. The original name will let you know where it came from because it will contain, augustus, genemark, snap, etc. You will also want to open up the contig containing those geens in a viewer like apollo (http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz). See if the short intron is part of the CDS or UTR. If it's UTR then, it has evidence support from an EST, which either means there are problems with the EST/cDNA evidence or it's real. For those, even if they are real you can just trim them off. If it's part of the CDS, then investigate whether it is suggested by EST or protein evidence, or if the ab initio predictor called it (sometime the ab initio predictor calls things to force an ORF to work). This can sometimes be indicative of assembly issues in that region. --Carson On 4/18/14, 7:14 AM, "UMD Bioinformatics" wrote: >Hello, > >We are preparing two submission for NCBI, nightmare. However some of our >MAKER gene models have short introns that are being flagged by NCBI. In >one species we have >400 introns smaller then 20bp which is almost >biologically impossible. I know we can set max intron length in the >opts.ctl file but can we set a minimum intron length? > >I saw yesterdays posts that mention this is a result of the external ab >initio predictors but I didn?t see an indication as to which predictor >and how to change that setting. > >from yesterday: >*These are just short introns (intron size is under control of the ab >initio >predictors) --> 438 ERROR: SEQ_FEAT.ShortIntron > >Cheers >Ian > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.seidl at wur.nl Tue Apr 22 08:27:18 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:27:18 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Message-ID: Hi, I have a question on the post-processing of my maker output. I finished a maker run on a draft genome (231 scaffolds) without an error. To get a merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. However, I realized that I contains next to gff3 conform output, thousands of lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced scaffolds is correct, however I have my doubts if I successfully retrieved all annotations...Could you maybe point me to a possible solution... Thanks in advance Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:31:16 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:31:16 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: Message-ID: I've never seen this. What version of MAKER are you using? --Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:27 AM To: Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi, I have a question on the post-processing of my maker output. I finished a maker run on a draft genome (231 scaffolds) without an error. To get a merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. However, I realized that I contains next to gff3 conform output, thousands of lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced scaffolds is correct, however I have my doubts if I successfully retrieved all annotations...Could you maybe point me to a possible solution... Thanks in advance Michael _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.seidl at wur.nl Tue Apr 22 08:37:33 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:37:33 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Hi Carson, I am using maker 2.31. Thanks Michael On Tue, Apr 22, 2014 at 4:31 PM, Carson Holt wrote: > I've never seen this. What version of MAKER are you using? > > --Carson > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:27 AM > To: > > Subject: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi, > > I have a question on the post-processing of my maker output. I finished a > maker run on a draft genome (231 scaffolds) without an error. To get a > merged gff3 I run ~/local_progs/maker/bin/gff3_merge -d > master_datastore_index.log. However, I realized that I contains next to > gff3 conform output, thousands of lines of array refs, e.g. > ARRAY(0x188a8578)). The total number of produced scaffolds is correct, > however I have my doubts if I successfully retrieved all > annotations...Could you maybe point me to a possible solution... > > Thanks in advance > Michael > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- *Michael F Seidl, PhD* Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:39:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:39:51 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Could you check the individual contig GFF3's before merge. Do any of those contain array refs? Also is it exactly 2.31 or the current 2.31.3? --Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi Carson, I am using maker 2.31. Thanks Michael On Tue, Apr 22, 2014 at 4:31 PM, Carson Holt wrote: > I've never seen this. What version of MAKER are you using? > > --Carson > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:27 AM > To: > > Subject: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi, > > I have a question on the post-processing of my maker output. I finished a > maker run on a draft genome (231 scaffolds) without an error. To get a merged > gff3 I run ~/local_progs/maker/bin/gff3_merge -d master_datastore_index.log. > However, I realized that I contains next to gff3 conform output, thousands of > lines of array refs, e.g. ARRAY(0x188a8578)). The total number of produced > scaffolds is correct, however I have my doubts if I successfully retrieved all > annotations...Could you maybe point me to a possible solution... > > Thanks in advance > Michael > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.seidl at wur.nl Tue Apr 22 08:43:44 2014 From: michael.seidl at wur.nl (Michael Seidl) Date: Tue, 22 Apr 2014 16:43:44 +0200 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt wrote: > any Dear Carson, maker -version returns 2.31. Yes, also the individual scaffolds seem to contain ARRAY refs, e.g. find -name "*gff" | xargs grep "ARRAY": ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 Cheers M -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 22 08:46:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 08:46:34 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> Message-ID: Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 8:43 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt wrote: > any Dear Carson, maker -version returns 2.31. Yes, also the individual scaffolds seem to contain ARRAY refs, e.g. find -name "*gff" | xargs grep "ARRAY": ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 Cheers M -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Tue Apr 22 11:45:45 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Tue, 22 Apr 2014 23:15:45 +0530 Subject: [maker-devel] is using est_reads option safe? Message-ID: Hi, I need to run MAKER against a genome with both raw (FASTQ) and assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using est= options in maker_opts.ctl. Looking for how to point MAKER to the raw reads I came across this thread https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where Dr. Carlson Holt points out that est_gff should be used. However, from MAKER's run log it seems that est_reads option is not deprecated, just hidden from plain sight by excluding it from maker_opts.ctl. So I set est_reads option in maker_opts.ctl and MAKER parses the control files and runs just fine. Now I am left wondering if it's safe to use est_reads. As in, could it impact the predicted set negatively? -- Priyam From carsonhh at gmail.com Tue Apr 22 12:02:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 12:02:56 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: The est_reads option doesn't do anything. It in the run log for backwards compatibility with old jobs because MAKER has a restart capability (i.e. people can rerun new MAKER versions against old MAKER output in the same directory - it can reuse old raw results to avoid rerunning analysis steps). The est_reads was originally there for developer experimentation, but then it went away. You need to use an external tool like tophat and cufflinks to align short reads and assemble them into likely exon blocks (i.e. the GFF3 passthrough option you mentioned). Or you can assemble then without alignment using something like trinity (then you can provide that result to the est= options because it will be in fasta format). You should not use raw reads directly with MAKER, you need to preprocess them using one of the methods mentioned for them to be useful. Thanks, Carson On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >Hi, > >I need to run MAKER against a genome with both raw (FASTQ) and >assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >est= options in maker_opts.ctl. Looking for how to point MAKER to the >raw reads I came across this thread >https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >Dr. Carlson Holt points out that est_gff should be used. However, from >MAKER's run log it seems that est_reads option is not deprecated, just >hidden from plain sight by excluding it from maker_opts.ctl. So I set >est_reads option in maker_opts.ctl and MAKER parses the control files >and runs just fine. > >Now I am left wondering if it's safe to use est_reads. As in, could it >impact the predicted set negatively? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Apr 22 13:10:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 13:10:46 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> <155dca02dbb84844930703f598f57635@SCOMP0939.wurnet.nl> Message-ID: The issue was indeed caused by a bug in using the other_gff= file option. Could you place the attached file in .../maker/lib/. Then you can rerun maker to test if it fixes it ('maker -a' for fast rerun without analysis rerun). Alternately if you don't feel like rerunning everything, you can also filter out the lines using --> grep -v "ARRAY" file.gff Since the other_gff file is not used in any part of the analysis and is just a convenience option that prints any text given to it into the final GFF3 file, then filtering them out is the same as if you would have left other_gff blank when running MAKER. You can then use 'gff3_merge -s tophat.gff merged_genome.gff' to merge the desired extra lines back into your file outside of MAKER. Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 12:29 PM To: Carson Holt Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Hi Carson, I uploaded the files as an archive. Thanks Michael On Tue, Apr 22, 2014 at 5:04 PM, Carson Holt wrote: > In the base maker.output directory for the job, there will be a file with a > .db extension. Could you send that as well? I'm leaning towards this being > something odd happening with the GFF3 files used as input. Particularly the > other_gff= file. Could you upload this file as well --> > /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/toph > at.gff3. > > --Carson > > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 8:56 AM > To: Carson Holt > > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Should be uploading right now... > > Thanks Michael > > > > On Tue, Apr 22, 2014 at 4:46 PM, Carson Holt > > wrote: > Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it > here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > Thanks, > Carson > > > From: Michael Seidl > >> > Date: Tuesday, April 22, 2014 at 8:43 AM > To: Carson Holt > o:carsonhh at gmail.com>>> > Cc: > "maker-devel at yandell-lab.org devel at yandell-lab.org>" > devel at yandell-lab.org>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > > On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt > o:carsonhh at gmail.com>>> wrote: > any > > Dear Carson, > > maker -version returns 2.31. Yes, also the individual scaffolds seem to > contain ARRAY refs, e.g. > find -name "*gff" | xargs grep "ARRAY": > > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 > > Cheers > M > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: michael.seidl at wur.nl > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GFFDB.pm Type: text/x-perl-script Size: 52153 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 22 14:35:31 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Apr 2014 14:35:31 -0600 Subject: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' In-Reply-To: References: <71a8c1de980642b3b2169e1c016a016a@SCOMP0940.wurnet.nl> <155dca02dbb84844930703f598f57635@SCOMP0939.wurnet.nl> Message-ID: You can provide a comma separated list of files to est_gff. Also from experience cufflinks gives far better results than tophat. Tophat tends to have a lot of false positives that adversely affect the overall quality of gene models, so I usually recommend that people use cufflinks output and not even include the tophat results in their run. Thanks, Carson From: Michael Seidl Date: Tuesday, April 22, 2014 at 2:30 PM To: Carson Holt Subject: Re: [maker-devel] thousands of array-refs in merged .gff after 'gff3_merge' Dear Carson, thanks a lot I will try. More importantly, you pointed me to a mistake in my procedure which will make me rerun the maker anyway :p I want maker to use the tophat.gff next to cufflinks est (fa + gff) as well as a protein.fa. I provide them currently as follows: #-----EST Evidence (for best results provide a file for at least one) est= /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/tr anscripts.cds.fa #set of ESTs or assembled mRNA-seq altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/tr anscripts.gff3 #aligned ESTs or mRNA-seq from a altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= /home/michael/data/side/alternaria/maker_annotation/fungal_proteins.fa #protein sequence file in fasta format (i.e. from mu protein_gff= #aligned protein homology evidence from an external GFF3 file Can I give the tophat.gff as a alttest.gff or is maker internally using est_gff and altest_gff differently? Sorry for this question, but I did not yet realized that the other_gff will be omitted during maker Thanks a lot Michael On Tue, Apr 22, 2014 at 9:10 PM, Carson Holt wrote: > The issue was indeed caused by a bug in using the other_gff= file option. > Could you place the attached file in .../maker/lib/. Then you can rerun maker > to test if it fixes it ('maker -a' for fast rerun without analysis rerun). > > Alternately if you don't feel like rerunning everything, you can also filter > out the lines using --> grep -v "ARRAY" file.gff > > Since the other_gff file is not used in any part of the analysis and is just a > convenience option that prints any text given to it into the final GFF3 file, > then filtering them out is the same as if you would have left other_gff blank > when running MAKER. You can then use 'gff3_merge -s tophat.gff > merged_genome.gff' to merge the desired extra lines back into your file > outside of MAKER. > > Thanks, > Carson > > > > From: Michael Seidl > > Date: Tuesday, April 22, 2014 at 12:29 PM > To: Carson Holt > > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Hi Carson, > > I uploaded the files as an archive. > > Thanks > Michael > > > On Tue, Apr 22, 2014 at 5:04 PM, Carson Holt > > wrote: > In the base maker.output directory for the job, there will be a file with a > .db extension. Could you send that as well? I'm leaning towards this being > something odd happening with the GFF3 files used as input. Particularly the > other_gff= file. Could you upload this file as well --> > /home/michael/data/side/alternaria/maker_annotation/Alternaria-CBS-916.96/toph > at.gff3. > > --Carson > > > From: Michael Seidl > >> > Date: Tuesday, April 22, 2014 at 8:56 AM > To: Carson Holt > o:carsonhh at gmail.com>>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > Should be uploading right now... > > Thanks Michael > > > > On Tue, Apr 22, 2014 at 4:46 PM, Carson Holt > o:carsonhh at gmail.com>>> wrote: > Could you pack up this directory for me --> /84/ED/scaffold3.1/ and upload it > here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > Thanks, > Carson > > > From: Michael Seidl > > l at wur.nl>>>> > Date: Tuesday, April 22, 2014 at 8:43 AM > To: Carson Holt > o:carsonhh at gmail.com>> ilto:carsonhh at gmail.com>>> > Cc: > "maker-devel at yandell-lab.org devel at yandell-lab.org> yandell-lab.org -lab.org>>" > devel at yandell-lab.org> yandell-lab.org -lab.org>>> > Subject: Re: [maker-devel] thousands of array-refs in merged .gff after > 'gff3_merge' > > > On Tue, Apr 22, 2014 at 4:39 PM, Carson Holt > o:carsonhh at gmail.com>> ilto:carsonhh at gmail.com>>> wrote: > any > > Dear Carson, > > maker -version returns 2.31. Yes, also the individual scaffolds seem to > contain ARRAY refs, e.g. > find -name "*gff" | xargs grep "ARRAY": > > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x41f6ea0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb87d888) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xd343528) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xb12fc48) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xde02488) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x8d4c698) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x447a8a0) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x4390048) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xdbb4e00) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xe3f1790) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0x438d570) > ./84/ED/scaffold3.1/theVoid.scaffold3.1/evidence_6.gff:ARRAY(0xae00088 > > Cheers > M > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: > michael.seidl at wur.nl mailto:michael.seidl at wur.nl>> > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > > > > > > -- > Michael F Seidl, PhD > Research Fellow (Postdoc) > Laboratory of Phytopathology > Wageningen University > P.O. Box 8025, 6700 EE Wageningen > Wageningen Campus, building 107 (Radix) > Droevendaalsesteeg 1, 6708 PB Wageningen > > Tel.: +31-317-481288 > Fax: +31-317-483412 > > Email: michael.seidl at wur.nl > Website: http://www.php.wur.nl/UK/ > Twitter: @MFSeidl > > www.disclaimer-uk.wur.nl > -- Michael F Seidl, PhD Research Fellow (Postdoc) Laboratory of Phytopathology Wageningen University P.O. Box 8025, 6700 EE Wageningen Wageningen Campus, building 107 (Radix) Droevendaalsesteeg 1, 6708 PB Wageningen Tel.: +31-317-481288 Fax: +31-317-483412 Email: michael.seidl at wur.nl Website: http://www.php.wur.nl/UK/ Twitter: @MFSeidl www.disclaimer-uk.wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Wed Apr 23 03:55:37 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Wed, 23 Apr 2014 15:25:37 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: Thanks, Carson. I now understand that I shouldn't use est_reds options. Does MAKER utilise est_gff for prediction or simply passes the annotations through to the output GFF? In that case how is it different from using other_gff / model_gff (what's the difference between these two?) I have both assembled and raw reads. Is it sufficient to just use the assembled set? -- Priyam On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: > The est_reads option doesn't do anything. It in the run log for backwards > compatibility with old jobs because MAKER has a restart capability (i.e. > people can rerun new MAKER versions against old MAKER output in the same > directory - it can reuse old raw results to avoid rerunning analysis > steps). The est_reads was originally there for developer experimentation, > but then it went away. > > You need to use an external tool like tophat and cufflinks to align short > reads and assemble them into likely exon blocks (i.e. the GFF3 passthrough > option you mentioned). Or you can assemble then without alignment using > something like trinity (then you can provide that result to the est= > options because it will be in fasta format). > > You should not use raw reads directly with MAKER, you need to preprocess > them using one of the methods mentioned for them to be useful. > > Thanks, > Carson > > > > On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: > >>Hi, >> >>I need to run MAKER against a genome with both raw (FASTQ) and >>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>raw reads I came across this thread >>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>Dr. Carlson Holt points out that est_gff should be used. However, from >>MAKER's run log it seems that est_reads option is not deprecated, just >>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>est_reads option in maker_opts.ctl and MAKER parses the control files >>and runs just fine. >> >>Now I am left wondering if it's safe to use est_reads. As in, could it >>impact the predicted set negatively? >> >>-- Priyam >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Wed Apr 23 08:43:54 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 23 Apr 2014 08:43:54 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: est_gff is the equivalent of est=, but because the alignment structure is already in the GFF3, I don't need to align sequence with blastn/exonerate. model_gff and pred_gff are essentially the same with the difference being that model_gff can be kept in the final results even without supporting evidence, but pred_gff won't. Pred_gff needs evidence support because it is a potential model, where model_gff is considered a known model even if the structure of that model may be uncertain. other_gff is just a convenience method for passing through GFF3 features to the final result. It's impossible to have MAKER be aware of every kind of possible entry, so if you have something more exotic in the final output (sequence variant information, alternate alleles, promotor and methylation site, etc.) then you can pass it in there and it will just be printed into the file. It's basically the equivalent of concatenating two GFF3 files together, but it handles the proper reordering of sequence information at the end of the GFF3 file (because technically you can't just concatenate GFF3 files end-to-end). You can also use the gff3_merge tool that comes with MAKER to get the same effect. --Carson On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >Thanks, Carson. > >I now understand that I shouldn't use est_reds options. > >Does MAKER utilise est_gff for prediction or simply passes the >annotations through to the output GFF? In that case how is it >different from using other_gff / model_gff (what's the difference >between these two?) > >I have both assembled and raw reads. Is it sufficient to just use the >assembled set? > >-- Priyam > >On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: >> The est_reads option doesn't do anything. It in the run log for >>backwards >> compatibility with old jobs because MAKER has a restart capability (i.e. >> people can rerun new MAKER versions against old MAKER output in the same >> directory - it can reuse old raw results to avoid rerunning analysis >> steps). The est_reads was originally there for developer >>experimentation, >> but then it went away. >> >> You need to use an external tool like tophat and cufflinks to align >>short >> reads and assemble them into likely exon blocks (i.e. the GFF3 >>passthrough >> option you mentioned). Or you can assemble then without alignment using >> something like trinity (then you can provide that result to the est= >> options because it will be in fasta format). >> >> You should not use raw reads directly with MAKER, you need to preprocess >> them using one of the methods mentioned for them to be useful. >> >> Thanks, >> Carson >> >> >> >> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >> >>>Hi, >>> >>>I need to run MAKER against a genome with both raw (FASTQ) and >>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>raw reads I came across this thread >>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>MAKER's run log it seems that est_reads option is not deprecated, just >>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>and runs just fine. >>> >>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>impact the predicted set negatively? >>> >>>-- Priyam >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> From kdelmore at zoology.ubc.ca Tue Apr 22 22:48:08 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 22 Apr 2014 21:48:08 -0700 Subject: [maker-devel] problem with dsindex Message-ID: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> I am having some trouble with the dsindex tool. I used the fasta_tool to split my original multifasta file and ran maker with the ?base and ?g flags. I then used the dsindex tool to summarize results from each fasta. The tool finished without an error message and pointed me to where the files should be but when I went to that directory there was no datastore and the index.log said that it had started on each of the fastas but not finished. I got around this problem using gff3_merge by using the ?o option and providing paths to the gff files but this is not working with the fasta_merge tool. I don?t want to just cat the files together because I want to be sure the merged gff and protein.fasta files are the same for downstream annotation steps. I?ve included examples of the commands I used below and the output from dsindex. Note that the individual fastas finished without errors and produced datastores. I would really appreciate any input you might have with this problem and THANK YOU for developing such a user friendly pipeline. /maker/bin/fasta_tool --split placed.fasta mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides maker/bin/maker -dsindex -fix_nucleotides STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /placed.maker.output/placed_datastore ##this directory was not generated To access files for individual sequences use the datastore index: /placed.maker.output/placed_master_datastore_index.log /maker/bin/gff3_merge -o placed.gff * /maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta 2.maker.proteins.fasta ##this did not work From carson.holt at genetics.utah.edu Wed Apr 23 08:51:59 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 23 Apr 2014 14:51:59 +0000 Subject: [maker-devel] problem with dsindex In-Reply-To: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> References: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> Message-ID: I don't think all your contigs are finished or you did not supply the -base tag when running -dsindex. If it says STARTED rather than FINISHED, then the output files for that contig are missing from the directory it is looking at. For example this is how you should be running everything --> /maker/bin/fasta_tool --split placed.fasta mpiexec -n 4 /maker/bin/maker -base placed -g 1.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 2.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 3.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 4.fasta -fix_nucleotides mpiexec -n 4 /maker/bin/maker -base placed -g 5.fasta -fix_nucleotides Now all will write to placed.maker.output Then you need to do this--> maker/bin/maker -dsindex -base placed -g placed.fasta Then it will rebuild the index for placed.maker.output/placed_master_datastore_index.log Thanks, Carson On 4/22/14, 10:48 PM, "kdelmore at zoology.ubc.ca" wrote: >I am having some trouble with the dsindex tool. I used the fasta_tool to >split my original multifasta file and ran maker with the ?base and ?g >flags. I then used the dsindex tool to summarize results from each fasta. >The tool finished without an error message and pointed me to where the >files should be but when I went to that directory there was no datastore >and the index.log said that it had started on each of the fastas but not >finished. I got around this problem using gff3_merge by using the ?o >option and providing paths to the gff files but this is not working with >the fasta_merge tool. I don?t want to just cat the files together because >I want to be sure the merged gff and protein.fasta files are the same for >downstream annotation steps. I?ve included examples of the commands I used >below and the output from dsindex. Note that the individual fastas >finished without errors and produced datastores. > >I would really appreciate any input you might have with this problem and >THANK YOU for developing such a user friendly pipeline. > >/maker/bin/fasta_tool --split placed.fasta > >mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides > >maker/bin/maker -dsindex -fix_nucleotides >STATUS: Parsing control files... >STATUS: Processing and indexing input FASTA files... >STATUS: Setting up database for any GFF3 input... >A data structure will be created for you at: >/placed.maker.output/placed_datastore ##this directory was not generated >To access files for individual sequences use the datastore index: >/placed.maker.output/placed_master_datastore_index.log > >/maker/bin/gff3_merge -o placed.gff * > >/maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta >2.maker.proteins.fasta ##this did not work > > > From carsonhh at gmail.com Wed Apr 23 08:57:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 23 Apr 2014 08:57:34 -0600 Subject: [maker-devel] problem with dsindex In-Reply-To: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> References: <60a6fff977c271a1601a9f96cfd2d2d9.squirrel@webmail.zoology.ubc.ca> Message-ID: Also fasta_merge works differently than gff3_merge. It requires the datastore index because it is trying to find directories and the 'type' and 'group' the fasta files in those directories. Without the datastore index, it is the equivalent of 'cat file1.fa file2.fa > file3.fa'. It also requires the '-i' flag when specifying individual fasta files. --Carson On 4/22/14, 10:48 PM, "kdelmore at zoology.ubc.ca" wrote: >I am having some trouble with the dsindex tool. I used the fasta_tool to >split my original multifasta file and ran maker with the ?base and ?g >flags. I then used the dsindex tool to summarize results from each fasta. >The tool finished without an error message and pointed me to where the >files should be but when I went to that directory there was no datastore >and the index.log said that it had started on each of the fastas but not >finished. I got around this problem using gff3_merge by using the ?o >option and providing paths to the gff files but this is not working with >the fasta_merge tool. I don?t want to just cat the files together because >I want to be sure the merged gff and protein.fasta files are the same for >downstream annotation steps. I?ve included examples of the commands I used >below and the output from dsindex. Note that the individual fastas >finished without errors and produced datastores. > >I would really appreciate any input you might have with this problem and >THANK YOU for developing such a user friendly pipeline. > >/maker/bin/fasta_tool --split placed.fasta > >mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides > >maker/bin/maker -dsindex -fix_nucleotides >STATUS: Parsing control files... >STATUS: Processing and indexing input FASTA files... >STATUS: Setting up database for any GFF3 input... >A data structure will be created for you at: >/placed.maker.output/placed_datastore ##this directory was not generated >To access files for individual sequences use the datastore index: >/placed.maker.output/placed_master_datastore_index.log > >/maker/bin/gff3_merge -o placed.gff * > >/maker/bin/fasta_merge ?o placed.all 1.maker.proteins.fasta >2.maker.proteins.fasta ##this did not work > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From a.priyam at qmul.ac.uk Thu Apr 24 01:28:38 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Thu, 24 Apr 2014 12:58:38 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: You say est_gff is the equivalent of est= (except that alignment structure is a part of gff). What would MAKER do if I set both est= and est_gff= options in maker_opts.ctl? Will it ignore est=? -- Priyam On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: > est_gff is the equivalent of est=, but because the alignment structure is > already in the GFF3, I don't need to align sequence with blastn/exonerate. > model_gff and pred_gff are essentially the same with the difference being > that model_gff can be kept in the final results even without supporting > evidence, but pred_gff won't. Pred_gff needs evidence support because it > is a potential model, where model_gff is considered a known model even if > the structure of that model may be uncertain. > > other_gff is just a convenience method for passing through GFF3 features > to the final result. It's impossible to have MAKER be aware of every kind > of possible entry, so if you have something more exotic in the final > output (sequence variant information, alternate alleles, promotor and > methylation site, etc.) then you can pass it in there and it will just be > printed into the file. It's basically the equivalent of concatenating two > GFF3 files together, but it handles the proper reordering of sequence > information at the end of the GFF3 file (because technically you can't > just concatenate GFF3 files end-to-end). You can also use the gff3_merge > tool that comes with MAKER to get the same effect. > > --Carson > > > > On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: > >>Thanks, Carson. >> >>I now understand that I shouldn't use est_reds options. >> >>Does MAKER utilise est_gff for prediction or simply passes the >>annotations through to the output GFF? In that case how is it >>different from using other_gff / model_gff (what's the difference >>between these two?) >> >>I have both assembled and raw reads. Is it sufficient to just use the >>assembled set? >> >>-- Priyam >> >>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt wrote: >>> The est_reads option doesn't do anything. It in the run log for >>>backwards >>> compatibility with old jobs because MAKER has a restart capability (i.e. >>> people can rerun new MAKER versions against old MAKER output in the same >>> directory - it can reuse old raw results to avoid rerunning analysis >>> steps). The est_reads was originally there for developer >>>experimentation, >>> but then it went away. >>> >>> You need to use an external tool like tophat and cufflinks to align >>>short >>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>passthrough >>> option you mentioned). Or you can assemble then without alignment using >>> something like trinity (then you can provide that result to the est= >>> options because it will be in fasta format). >>> >>> You should not use raw reads directly with MAKER, you need to preprocess >>> them using one of the methods mentioned for them to be useful. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>> >>>>Hi, >>>> >>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>raw reads I came across this thread >>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>and runs just fine. >>>> >>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>impact the predicted set negatively? >>>> >>>>-- Priyam >>>> >>>>_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> > > From carsonhh at gmail.com Thu Apr 24 08:15:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 24 Apr 2014 08:15:07 -0600 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: It will use both. you can also provide multiple files to either using comma separated lists. --Carson On 4/24/14, 1:28 AM, "Anurag Priyam" wrote: >You say est_gff is the equivalent of est= (except that alignment >structure is a part of gff). What would MAKER do if I set both est= >and est_gff= options in maker_opts.ctl? Will it ignore est=? > >-- Priyam > >On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: >> est_gff is the equivalent of est=, but because the alignment structure >>is >> already in the GFF3, I don't need to align sequence with >>blastn/exonerate. >> model_gff and pred_gff are essentially the same with the difference >>being >> that model_gff can be kept in the final results even without supporting >> evidence, but pred_gff won't. Pred_gff needs evidence support because >>it >> is a potential model, where model_gff is considered a known model even >>if >> the structure of that model may be uncertain. >> >> other_gff is just a convenience method for passing through GFF3 features >> to the final result. It's impossible to have MAKER be aware of every >>kind >> of possible entry, so if you have something more exotic in the final >> output (sequence variant information, alternate alleles, promotor and >> methylation site, etc.) then you can pass it in there and it will just >>be >> printed into the file. It's basically the equivalent of concatenating >>two >> GFF3 files together, but it handles the proper reordering of sequence >> information at the end of the GFF3 file (because technically you can't >> just concatenate GFF3 files end-to-end). You can also use the >>gff3_merge >> tool that comes with MAKER to get the same effect. >> >> --Carson >> >> >> >> On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >> >>>Thanks, Carson. >>> >>>I now understand that I shouldn't use est_reds options. >>> >>>Does MAKER utilise est_gff for prediction or simply passes the >>>annotations through to the output GFF? In that case how is it >>>different from using other_gff / model_gff (what's the difference >>>between these two?) >>> >>>I have both assembled and raw reads. Is it sufficient to just use the >>>assembled set? >>> >>>-- Priyam >>> >>>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt >>>wrote: >>>> The est_reads option doesn't do anything. It in the run log for >>>>backwards >>>> compatibility with old jobs because MAKER has a restart capability >>>>(i.e. >>>> people can rerun new MAKER versions against old MAKER output in the >>>>same >>>> directory - it can reuse old raw results to avoid rerunning analysis >>>> steps). The est_reads was originally there for developer >>>>experimentation, >>>> but then it went away. >>>> >>>> You need to use an external tool like tophat and cufflinks to align >>>>short >>>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>>passthrough >>>> option you mentioned). Or you can assemble then without alignment >>>>using >>>> something like trinity (then you can provide that result to the est= >>>> options because it will be in fasta format). >>>> >>>> You should not use raw reads directly with MAKER, you need to >>>>preprocess >>>> them using one of the methods mentioned for them to be useful. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>>> >>>>>Hi, >>>>> >>>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>>raw reads I came across this thread >>>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>>and runs just fine. >>>>> >>>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>>impact the predicted set negatively? >>>>> >>>>>-- Priyam >>>>> >>>>>_______________________________________________ >>>>>maker-devel mailing list >>>>>maker-devel at box290.bluehost.com >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >> >> From anurag08priyam at gmail.com Thu Apr 24 08:26:24 2014 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Thu, 24 Apr 2014 19:56:24 +0530 Subject: [maker-devel] is using est_reads option safe? In-Reply-To: References: Message-ID: That answers all my questions. Thanks, Carson. -- Priyam On Thu, Apr 24, 2014 at 7:45 PM, Carson Holt wrote: > It will use both. you can also provide multiple files to either using > comma separated lists. > > --Carson > > > On 4/24/14, 1:28 AM, "Anurag Priyam" wrote: > >>You say est_gff is the equivalent of est= (except that alignment >>structure is a part of gff). What would MAKER do if I set both est= >>and est_gff= options in maker_opts.ctl? Will it ignore est=? >> >>-- Priyam >> >>On Wed, Apr 23, 2014 at 8:13 PM, Carson Holt wrote: >>> est_gff is the equivalent of est=, but because the alignment structure >>>is >>> already in the GFF3, I don't need to align sequence with >>>blastn/exonerate. >>> model_gff and pred_gff are essentially the same with the difference >>>being >>> that model_gff can be kept in the final results even without supporting >>> evidence, but pred_gff won't. Pred_gff needs evidence support because >>>it >>> is a potential model, where model_gff is considered a known model even >>>if >>> the structure of that model may be uncertain. >>> >>> other_gff is just a convenience method for passing through GFF3 features >>> to the final result. It's impossible to have MAKER be aware of every >>>kind >>> of possible entry, so if you have something more exotic in the final >>> output (sequence variant information, alternate alleles, promotor and >>> methylation site, etc.) then you can pass it in there and it will just >>>be >>> printed into the file. It's basically the equivalent of concatenating >>>two >>> GFF3 files together, but it handles the proper reordering of sequence >>> information at the end of the GFF3 file (because technically you can't >>> just concatenate GFF3 files end-to-end). You can also use the >>>gff3_merge >>> tool that comes with MAKER to get the same effect. >>> >>> --Carson >>> >>> >>> >>> On 4/23/14, 3:55 AM, "Anurag Priyam" wrote: >>> >>>>Thanks, Carson. >>>> >>>>I now understand that I shouldn't use est_reds options. >>>> >>>>Does MAKER utilise est_gff for prediction or simply passes the >>>>annotations through to the output GFF? In that case how is it >>>>different from using other_gff / model_gff (what's the difference >>>>between these two?) >>>> >>>>I have both assembled and raw reads. Is it sufficient to just use the >>>>assembled set? >>>> >>>>-- Priyam >>>> >>>>On Tue, Apr 22, 2014 at 11:32 PM, Carson Holt >>>>wrote: >>>>> The est_reads option doesn't do anything. It in the run log for >>>>>backwards >>>>> compatibility with old jobs because MAKER has a restart capability >>>>>(i.e. >>>>> people can rerun new MAKER versions against old MAKER output in the >>>>>same >>>>> directory - it can reuse old raw results to avoid rerunning analysis >>>>> steps). The est_reads was originally there for developer >>>>>experimentation, >>>>> but then it went away. >>>>> >>>>> You need to use an external tool like tophat and cufflinks to align >>>>>short >>>>> reads and assemble them into likely exon blocks (i.e. the GFF3 >>>>>passthrough >>>>> option you mentioned). Or you can assemble then without alignment >>>>>using >>>>> something like trinity (then you can provide that result to the est= >>>>> options because it will be in fasta format). >>>>> >>>>> You should not use raw reads directly with MAKER, you need to >>>>>preprocess >>>>> them using one of the methods mentioned for them to be useful. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> On 4/22/14, 11:45 AM, "Anurag Priyam" wrote: >>>>> >>>>>>Hi, >>>>>> >>>>>>I need to run MAKER against a genome with both raw (FASTQ) and >>>>>>assembled (FASTA) RNA-Seq data. I point MAKER to assembled data using >>>>>>est= options in maker_opts.ctl. Looking for how to point MAKER to the >>>>>>raw reads I came across this thread >>>>>>https://groups.google.com/forum/#!topic/maker-devel/oLEXJ4z4fDY where >>>>>>Dr. Carlson Holt points out that est_gff should be used. However, from >>>>>>MAKER's run log it seems that est_reads option is not deprecated, just >>>>>>hidden from plain sight by excluding it from maker_opts.ctl. So I set >>>>>>est_reads option in maker_opts.ctl and MAKER parses the control files >>>>>>and runs just fine. >>>>>> >>>>>>Now I am left wondering if it's safe to use est_reads. As in, could it >>>>>>impact the predicted set negatively? >>>>>> >>>>>>-- Priyam >>>>>> >>>>>>_______________________________________________ >>>>>>maker-devel mailing list >>>>>>maker-devel at box290.bluehost.com >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>>g >>>>> >>>>> >>> >>> > > From matthew.macmanes at unh.edu Sat Apr 26 08:56:25 2014 From: matthew.macmanes at unh.edu (Matthew MacManes) Date: Sat, 26 Apr 2014 10:56:25 -0400 Subject: [maker-devel] Use of each() on hash Message-ID: Hello, I am getting a large number of errors, while running maker on my ubuntu server. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI Compiled perl modules using ./build Thanks for any insight anyone may have. __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 09:26:24 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 09:26:24 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: Message-ID: The message appears to be coming from forks.pm. Probably a warning added to perl 5.18.2 which is really really new (other versions don't care about this), and most developers would not consider 5.18 a fully stable release for production purposes (it will have lots of test features and messages that will get improved or dropped rather quickly). You can try updating the forks module from CPAN. Otherwise I would ignore it, as forks is sufficiently tested to know it works (it's not a MAKER module, it a widely used CPAN module - literally tens of thousands of scripts use it worldwide). The authors of forks.pm will take steps to silence the warning rather quickly, or the warning will be removed from the perl interpreter altogether. Thanks, Carson Sent from my iPhone > On Apr 26, 2014, at 8:56 AM, Matthew MacManes wrote: > > Hello, > > I am getting a large number of errors, while running maker on my ubuntu server. > > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. > > It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. > > For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI > > Compiled perl modules using ./build > > Thanks for any insight anyone may have. > > __________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Sat Apr 26 21:34:16 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 27 Apr 2014 03:34:16 +0000 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: Message-ID: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt > wrote: The message appears to be coming from forks.pm. Probably a warning added to perl 5.18.2 which is really really new (other versions don't care about this), and most developers would not consider 5.18 a fully stable release for production purposes (it will have lots of test features and messages that will get improved or dropped rather quickly). You can try updating the forks module from CPAN. Otherwise I would ignore it, as forks is sufficiently tested to know it works (it's not a MAKER module, it a widely used CPAN module - literally tens of thousands of scripts use it worldwide). The authors of forks.pm will take steps to silence the warning rather quickly, or the warning will be removed from the perl interpreter altogether. Thanks, Carson Sent from my iPhone On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: Hello, I am getting a large number of errors, while running maker on my ubuntu server. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x2045200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x837200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. Use of each() on hash after insertion without resetting hash iterator results in undefined behavior, Perl interpreter: 0x9d1200 at /usr/local/lib/perl/5.18.2/forks.pm line 1736. It is unclear how this effects the results or performance of the software, but these errors are repeated thousands of times in even a small run. For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI Compiled perl modules using ./build Thanks for any insight anyone may have. __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 22:06:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 22:06:46 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> References: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> Message-ID: Yah, I had already seen that ticket. It's related to changing the function from a while loop to a foreach loop just to suppress the warning. Not sure why the forks.pm maintainer hasn't looked at it, but I imagine he will probably just do something more like --> no warnings qw(each); or whatever would suppress that warning without altering anything else in the code. I wouldn't say 5.18 is a development release. What said is that it's not good for 'production'. The problem is that most system still use 5.10 and 5.12, with a very few only recently moving to 5.16 (amazon's EC2 images for example). So you will find that issues with even very popular CPAN modules (as we see here) will be more common in something like 5.18.X. Not because 5.18 is flawed, or buggy, but because it's not yet used enough to flush out all the secondary issues it can cause elsewhere in wider world of perl. Thanks, Carson From: "Fields, Christopher J" Date: Saturday, April 26, 2014 at 9:34 PM To: Carson Holt Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt wrote: > The message appears to be coming from forks.pm. Probably a warning added to > perl 5.18.2 which is really really new (other versions don't care about this), > and most developers would not consider 5.18 a fully stable release for > production purposes (it will have lots of test features and messages that will > get improved or dropped rather quickly). You can try updating the forks > module from CPAN. Otherwise I would ignore it, as forks is sufficiently > tested to know it works (it's not a MAKER module, it a widely used CPAN module > - literally tens of thousands of scripts use it worldwide). The authors of > forks.pm will take steps to silence the warning rather quickly, or the warning > will be removed from the perl interpreter altogether. > > Thanks, > Carson > > Sent from my iPhone > > On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: > >> Hello, >> >> I am getting a large number of errors, while running maker on my ubuntu >> server. >> >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x2045200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x837200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x9d1200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> >> It is unclear how this effects the results or performance of the software, >> but these errors are repeated thousands of times in even a small run. >> >> For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI >> >> Compiled perl modules using ./build >> >> Thanks for any insight anyone may have. >> >> __________________________________ >> Matthew MacManes, Ph.D. >> University of New Hampshire I Assistant Professor >> Department of Molecular, Cellular, & Biomedical Sciences >> Durham, NH 03824 >> Phone: 603-862-4052 I Twitter: @PeroMHC >> Web: genomebio.org >> Office: 189 Rudman Hall I Lab: 145 Rudman Hall >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 26 22:51:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 26 Apr 2014 22:51:30 -0600 Subject: [maker-devel] Use of each() on hash In-Reply-To: References: <3498780C-70F2-4B80-B1B0-13F46668B802@illinois.edu> Message-ID: If you don't want to wait for the fork.pm maintainer to alter his code and submit an update to CPAN, you should be able to suppress the warning by manually editing forks.pm line 1736 yourself. Change it from this --> $write = each %WRITE; To this (make sure to include the {} brackets)--> { no warnings qw(internal); $write = each %WRITE; } The issue is because the modules author has his code calling 'each', altering the hash, and then calling 'each' again which causes a warning in perl 5.18+. In this case it's relatively innocuous because of how the value and 'each' function are being used (any hash reordering ends up being handled in an outer while loop). Thanks, Carson From: Carson Holt Date: Saturday, April 26, 2014 at 10:06 PM To: "Fields, Christopher J" Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash Yah, I had already seen that ticket. It's related to changing the function from a while loop to a foreach loop just to suppress the warning. Not sure why the forks.pm maintainer hasn't looked at it, but I imagine he will probably just do something more like --> no warnings qw(each); or whatever would suppress that warning without altering anything else in the code. I wouldn't say 5.18 is a development release. What said is that it's not good for 'production'. The problem is that most system still use 5.10 and 5.12, with a very few only recently moving to 5.16 (amazon's EC2 images for example). So you will find that issues with even very popular CPAN modules (as we see here) will be more common in something like 5.18.X. Not because 5.18 is flawed, or buggy, but because it's not yet used enough to flush out all the secondary issues it can cause elsewhere in wider world of perl. Thanks, Carson From: "Fields, Christopher J" Date: Saturday, April 26, 2014 at 9:34 PM To: Carson Holt Cc: Matthew MacManes , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Use of each() on hash See this RT ticket: https://rt.cpan.org/Public/Bug/Display.html?id=86910 The specific warning in question is there for a good reason, Reini Urban wrote about it recently and why it is bad: http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html There is a possible 2-line fix, mainly changing a while loop to a for loop, but the bug (originally reported in summer 2013) is still unfortunately open. Just a note, I don?t agree that perl 5.18.2 is a development release. Even numbered minor releases (5.10, 5.12?) are considered stable/production, odd numbered ones (5.19) are developer. I do agree that initial .0 ?patch? releases (e.g. 5.18.0) are generally to be avoided, but I always try to use a more recent version of perl when possible. This version is two releases past the .0, and perl 5.20 (next stable) is due next month. chris On Apr 26, 2014, at 10:26 AM, Carson Holt wrote: > The message appears to be coming from forks.pm. Probably a warning added to > perl 5.18.2 which is really really new (other versions don't care about this), > and most developers would not consider 5.18 a fully stable release for > production purposes (it will have lots of test features and messages that will > get improved or dropped rather quickly). You can try updating the forks > module from CPAN. Otherwise I would ignore it, as forks is sufficiently > tested to know it works (it's not a MAKER module, it a widely used CPAN module > - literally tens of thousands of scripts use it worldwide). The authors of > forks.pm will take steps to silence the warning rather quickly, or the warning > will be removed from the perl interpreter altogether. > > Thanks, > Carson > > Sent from my iPhone > > On Apr 26, 2014, at 8:56 AM, Matthew MacManes > wrote: > >> Hello, >> >> I am getting a large number of errors, while running maker on my ubuntu >> server. >> >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x2045200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x837200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> Use of each() on hash after insertion without resetting hash iterator results >> in undefined behavior, Perl interpreter: 0x9d1200 at >> /usr/local/lib/perl/5.18.2/forks.pm line 1736. >> >> It is unclear how this effects the results or performance of the software, >> but these errors are repeated thousands of times in even a small run. >> >> For the record, Maker 2.31, Ubuntu 14.04, perl 5.18.2, MPI via OpenMPI >> >> Compiled perl modules using ./build >> >> Thanks for any insight anyone may have. >> >> __________________________________ >> Matthew MacManes, Ph.D. >> University of New Hampshire I Assistant Professor >> Department of Molecular, Cellular, & Biomedical Sciences >> Durham, NH 03824 >> Phone: 603-862-4052 I Twitter: @PeroMHC >> Web: genomebio.org >> Office: 189 Rudman Hall I Lab: 145 Rudman Hall >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From muriel.grosb at gmail.com Mon Apr 28 02:35:25 2014 From: muriel.grosb at gmail.com (Muriel Gros-Balthazard) Date: Mon, 28 Apr 2014 10:35:25 +0200 Subject: [maker-devel] Repeat Library Construction : Exclusion of gene fragments Message-ID: <535E12CD.9020302@gmail.com> Hello ! I ran RepeatModeler and seperates the output into ModelerID.lib and Modelerunknown.lib as it is explained in the protocole. In total, I have about 600 sequences in these two files. I now want to exclude gene fragments. I downloaded in UniProtDB all the plant protein sequences and plan to use blastx. However, I don't know which parameter I should use for blastx, especially, the -e value ? Thanks a lot for your help, Muriel GB From mhinsley at ebi.ac.uk Tue Apr 29 02:21:06 2014 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Tue, 29 Apr 2014 09:21:06 +0100 Subject: [maker-devel] unexpected alternate splicing Message-ID: <535F60F2.5050902@ebi.ac.uk> Hi We've just reinstalled maker 2.31 using mpich3 (3.1) and are delighted that file locking and other issues have been resolved. (I'm running maker across several nodes on the compute farm). The maker code is identical: I took the previous tar.gz archive and made a clean build. Using a copy of a previous configuration to test, the only differences I can see is that the location of some files has changed (the working directory is on a different file system) and that I'm using a bigger (unfiltered) repeat library. The previous maker run produced 17393 genes and 17393 mRNAs, and this new version gives 15927 genes and 21328 mRNA. I have alt_splice=0: $ grep splice ../maker_opts.ctl alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no Any idea why I'm getting multiple mRNAs per gene? -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom From carsonhh at gmail.com Tue Apr 29 06:59:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Apr 2014 06:59:04 -0600 Subject: [maker-devel] unexpected alternate splicing In-Reply-To: <535F60F2.5050902@ebi.ac.uk> References: <535F60F2.5050902@ebi.ac.uk> Message-ID: <1653CD3E-CEB7-437E-88CC-0F65C9BDA931@gmail.com> Are you using gff3 files as input? If so, could you send those to me? They are probably coming from thise. --carson Sent from my iPhone > On Apr 29, 2014, at 2:21 AM, Malcolm Hinsley wrote: > > Hi > > We've just reinstalled maker 2.31 using mpich3 (3.1) and are delighted that file locking and other issues have been resolved. (I'm running maker across several nodes on the compute farm). The maker code is identical: I took the previous tar.gz archive and made a clean build. > > Using a copy of a previous configuration to test, the only differences I can see is that the location of some files has changed (the working directory is on a different file system) and that I'm using a bigger (unfiltered) repeat library. > > The previous maker run produced 17393 genes and 17393 mRNAs, and this new version gives 15927 genes and 21328 mRNA. > > I have alt_splice=0: > > $ grep splice ../maker_opts.ctl > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > > > Any idea why I'm getting multiple mRNAs per gene? > > -- > malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 > European Bioinformatics Institute (EMBL-EBI) > European Molecular Biology Laboratory > Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD > United Kingdom > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carson.holt at genetics.utah.edu Wed Apr 30 08:53:29 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 30 Apr 2014 14:53:29 +0000 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: <1398869131512.52399@uga.edu> References: <1398869131512.52399@uga.edu> Message-ID: From: Sivaranjani Namasivayam > Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" > Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 30 08:55:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Apr 2014 08:55:12 -0600 Subject: [maker-devel] FW: protein2genome gene models Message-ID: Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed Apr 30 17:25:17 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 30 Apr 2014 16:25:17 -0700 Subject: [maker-devel] est_forward and conflicting names Message-ID: Hi, Carson. I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m using with est and protein to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I?m using est_forward to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like orf1234 or other numeric IDs. When multiple evidence sequences map to the same location, how does est_forward choose which name to use? If it?s chosen arbitrarily, could it be possible to choose the most common name instead? Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: