From jjin01 at mail.rockefeller.edu Sun Sep 1 03:17:07 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Sun, 1 Sep 2013 08:17:07 +0000 Subject: [maker-devel] error about DBD::SQLite::db Message-ID: Dear all, When I try to run maker on my test dataset, there is an error like this: DBD::SQLite::db do failed: near ",": syntax error at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 1. DBD::SQLite::db do failed: no such column: JUNC00000001 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 2. DBD::SQLite::db do failed: no such column: JUNC00000002 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 3. DBD::SQLite::db do failed: no such column: JUNC00000003 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 4. DBD::SQLite::db do failed: no such column: JUNC00000004 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 5. DBD::SQLite::db do failed: no such column: JUNC00000005 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 6. DBD::SQLite::db do failed: no such column: JUNC00000006 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 7. DBD::SQLite::db do failed: no such column: JUNC00000007 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 8. DBD::SQLite::db do failed: no such column: JUNC00000008 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 9. DBD::SQLite::db do failed: no such column: JUNC00000009 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 10. DBD::SQLite::db do failed: no such column: JUNC00000010 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 11. DBD::SQLite::db do failed: no such column: JUNC00000011 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 12. DBD::SQLite::db do failed: no such column: JUNC00000012 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 13. DBD::SQLite::db do failed: no such column: JUNC00000013 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 14. DBD::SQLite::db do failed: no such column: JUNC00000014 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 15. DBD::SQLite::db do failed: no such column: JUNC00000015 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 16. DBD::SQLite::db do failed: no such column: JUNC00000016 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 17. DBD::SQLite::db do failed: no such column: JUNC00000017 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 18. The JUN*** is the exteral EST I provide. Can anyone give me some suggestions? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Sep 1 06:26:47 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 01 Sep 2013 07:26:47 -0400 Subject: [maker-devel] error about DBD::SQLite::db In-Reply-To: Message-ID: Most likely an issue with your input files format. Try this GFF3 file validator --> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Also make sure you are using the most recent version of MAKER. --Carson From: Jingjing Jin Date: Sunday, September 1, 2013 4:17 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] error about DBD::SQLite::db Dear all, When I try to run maker on my test dataset, there is an error like this: DBD::SQLite::db do failed: near ",": syntax error at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 1. DBD::SQLite::db do failed: no such column: JUNC00000001 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 2. DBD::SQLite::db do failed: no such column: JUNC00000002 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 3. DBD::SQLite::db do failed: no such column: JUNC00000003 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 4. DBD::SQLite::db do failed: no such column: JUNC00000004 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 5. DBD::SQLite::db do failed: no such column: JUNC00000005 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 6. DBD::SQLite::db do failed: no such column: JUNC00000006 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 7. DBD::SQLite::db do failed: no such column: JUNC00000007 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 8. DBD::SQLite::db do failed: no such column: JUNC00000008 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 9. DBD::SQLite::db do failed: no such column: JUNC00000009 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 10. DBD::SQLite::db do failed: no such column: JUNC00000010 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 11. DBD::SQLite::db do failed: no such column: JUNC00000011 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 12. DBD::SQLite::db do failed: no such column: JUNC00000012 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 13. DBD::SQLite::db do failed: no such column: JUNC00000013 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 14. DBD::SQLite::db do failed: no such column: JUNC00000014 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 15. DBD::SQLite::db do failed: no such column: JUNC00000015 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 16. DBD::SQLite::db do failed: no such column: JUNC00000016 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 17. DBD::SQLite::db do failed: no such column: JUNC00000017 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 18. The JUN*** is the exteral EST I provide. Can anyone give me some suggestions? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uqslizbe at uq.edu.au Thu Sep 5 02:30:26 2013 From: uqslizbe at uq.edu.au (Selene Lizbeth Fernandez Valverde) Date: Thu, 5 Sep 2013 17:30:26 +1000 Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks Message-ID: Hi all, I'm currently using Maker to reannotate the genome of the marine sponge. We already have a set of Augustus prediction and gene models that I mapped back to the genome using the patched map2assembly script posted on the mailing list, as well as PASA transcripts (based on Trinity assemblies) and cufflinks transcripts. I would like to include both Trinity and Cufflinks, as in some cases one outperforms the other. I'm currently planning to provide the Trinity/PASA assemblies as fasta to the "est" option and the cufflinks assemblies as gff3 using the "est_gff" option but I'm wondering if MAKER will take into account both types of evidence? Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? Thanks in advance for the advice, Selene **est_gff/est --> These are assumed to be correctly assembled and aligned around splice sites (MAKER uses exonerate to align around splice sites for ESTs in FASTA files). MAKER can use them to infer gene models directly (est2genome option), can use them as support for maintaining predictions, and can use them to modify structure and add UTR to predictions. If you let MAKER try and find alternative splice forms, they will be used to identify support for splice variants. How these cluster with other evidence will help MAKER infer gene boundaries in some cases. MAKER will also use splice sites inferred from the ESTs to inform gene predictors during the prediction step. Selene Fernandez-Valverde Ph.D. Postdoctoral Research Fellow School of Biological Sciences University of Queensland St Lucia QLD 4072 Australia uqslizbe at uq.edu.au -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 5 06:04:43 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Sep 2013 07:04:43 -0400 Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks In-Reply-To: Message-ID: 1. I'm wondering if MAKER will take into account both types of evidence? Yes. 2. Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? You can provide them as a comma separated list of files to the est_gff= option, or you can merge them using the gff3_merge script that comes with MAKER. Unfortunately I have no one best option for which evidence types to include. Every evidence type can contribute in it's own way to the final results. When you test using different evidence types, try running on a single large contig and manually view the results in a browser. Thanks, Carson From: Selene Lizbeth Fernandez Valverde Date: Thursday, September 5, 2013 3:30 AM To: Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks Hi all, I'm currently using Maker to reannotate the genome of the marine sponge. We already have a set of Augustus prediction and gene models that I mapped back to the genome using the patched map2assembly script posted on the mailing list, as well as PASA transcripts (based on Trinity assemblies) and cufflinks transcripts. I would like to include both Trinity and Cufflinks, as in some cases one outperforms the other. I'm currently planning to provide the Trinity/PASA assemblies as fasta to the "est" option and the cufflinks assemblies as gff3 using the "est_gff" option but I'm wondering if MAKER will take into account both types of evidence? Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? Thanks in advance for the advice, Selene **est_gff/est --> These are assumed to be correctly assembled and aligned around splice sites (MAKER uses exonerate to align around splice sites for ESTs in FASTA files). MAKER can use them to infer gene models directly (est2genome option), can use them as support for maintaining predictions, and can use them to modify structure and add UTR to predictions. If you let MAKER try and find alternative splice forms, they will be used to identify support for splice variants. How these cluster with other evidence will help MAKER infer gene boundaries in some cases. MAKER will also use splice sites inferred from the ESTs to inform gene predictors during the prediction step. Selene Fernandez-Valverde Ph.D. Postdoctoral Research Fellow School of Biological Sciences University of Queensland St Lucia QLD 4072 Australia uqslizbe at uq.edu.au _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.zohren at qmul.ac.uk Thu Sep 5 10:58:39 2013 From: j.zohren at qmul.ac.uk (Jasmin Zohren) Date: Thu, 5 Sep 2013 16:58:39 +0100 Subject: [maker-devel] Maker in the cloud Message-ID: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Dear Maker developers, I've already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: 1. To me it seems that there are two different Maker images on EC2 - ami-ea661f83 and ami-b10abed8 - which one is "the right one"? 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn't enough computing power? Or do you have another explanation for this? 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I've also recently heard about a parallelisation module for use in the cloud developed by Era7, called "nispero". But I am not sure whether it is publicly available yet. 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 5 11:26:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Sep 2013 12:26:08 -0400 Subject: [maker-devel] Maker in the cloud In-Reply-To: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Message-ID: Hello Jasmin, I haven't used MAKER in parralel on the cloud before (just tutorial images); however, I believe there is an iPlant atmosphere image available through iPlant with MAKER version 2.27. You can get maximum 16 cpus per instance there. --> http://www.iplantcollaborative.org/discover/atmosphere Alternatively if you have any US based collaborators you can apply for a startup allocation on the Lonestar cluster via XSEDE (allocation can be requested by any US based researcher and only takes a few days to approve) --> https://www.xsede.org/ That cluster was used recently to process the largest genome ever annotated (the pine genome). Total run time will be less than a day on that cluster, because you can request thousands of CPUs for your job with very short queue wait times. There is also a work in progress to give access to MAKER on the same cluster via the iPlant discovery environment. I've CC'd Joshua Stein who can correct me if I'm wrong, but I believe that resource would be available to non-US based researchers as well, and will be available in the very very near future (potentially within the next month or less). Perhaps someone else on the mailing list may want to share their experience using MAKER on the cloud? Thanks, Carson From: Jasmin Zohren Date: Thursday, September 5, 2013 11:58 AM To: Subject: [maker-devel] Maker in the cloud Dear Maker developers, I?ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: 1. To me it seems that there are two different Maker images on EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn?t enough computing power? Or do you have another explanation for this? 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I?ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called ?nispero?. But I am not sure whether it is publicly available yet. 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Thu Sep 5 13:06:05 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 5 Sep 2013 12:06:05 -0600 Subject: [maker-devel] Maker in the cloud In-Reply-To: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> References: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Message-ID: Hi Jasmin, Like Carson, my only significant experience with MAKER in the cloud is using it for our training, however, I'll add make some comments based on experience on the cloud with some of our other tools: There are several cloud architectures available now, but I only have experience with Amazon EC2, so all comments are only relevant there. I wouldn't use any of the existing MAKER AMIs. All of them were created for tutorial purposes, and while they should work fine for a real annotation job, they will be out of date. At the very least if you use one, start with it, but install current MAKER code and save it as a new AMI. You can use MPI on the Amazon nodes, but it's not set up by default to run MPI between nodes. That, can presumably be done but we haven't done it, so there may be headaches involved we just don't know for sure. However, you could split your input fasta into several chunks of roughly equal size and fire up a different EC2 node for each fasta file, then allow maker to use MPI to optimize parallelization on each node individually. MAKER is really good at restarting if things fail, so with that in mind I'd suggest staring spot nodes which can be 10X cheaper than regularly priced nodes. Amazon will kill a spot node as soon as someone comes along who is willing to pay full price, so you'd want a way (either manually checking and restarting nodes or scripting a AWS API solution) to check whether nodes finished and restart them if they did not, but you could save a lot of money by doing this. B On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote: > Dear Maker developers, > > I?ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: > > 1. To me it seems that there are two different Maker images on EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? > 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? > 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn?t enough computing power? Or do you have another explanation for this? > 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I?ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called ?nispero?. But I am not sure whether it is publicly available yet. > 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. > > Many thanks in advance and kind regards, > Jasmin > > ----------------------------- > Jasmin Zohren > PhD student in the INTERCROSSING ITN > Queen Mary University of London > > intercrossing.wikispaces.com > evolve.sbcs.qmul.ac.uk > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Fri Sep 6 13:34:32 2013 From: ejr at stowers.org (Ross, Eric) Date: Fri, 6 Sep 2013 18:34:32 +0000 Subject: [maker-devel] maker-devel Digest, Vol 64, Issue 4 In-Reply-To: Message-ID: It wouldn't be too difficult to run MAKER to run using something like starcluster. Starcluster manages the cluster and nodes for you. http://star.mit.edu/cluster/ It's not too difficult to use. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org On 9/6/13 1:00 PM, "maker-devel-request at yandell-lab.org" wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. Re: Maker in the cloud (Barry Moore) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Thu, 5 Sep 2013 12:06:05 -0600 >From: Barry Moore >To: Jasmin Zohren >Cc: maker-devel at yandell-lab.org >Subject: Re: [maker-devel] Maker in the cloud >Message-ID: >Content-Type: text/plain; charset="windows-1252" > >Hi Jasmin, > >Like Carson, my only significant experience with MAKER in the cloud is >using it for our training, however, I'll add make some comments based on >experience on the cloud with some of our other tools: > >There are several cloud architectures available now, but I only have >experience with Amazon EC2, so all comments are only relevant there. > >I wouldn't use any of the existing MAKER AMIs. All of them were created >for tutorial purposes, and while they should work fine for a real >annotation job, they will be out of date. At the very least if you use >one, start with it, but install current MAKER code and save it as a new >AMI. You can use MPI on the Amazon nodes, but it's not set up by default >to run MPI between nodes. That, can presumably be done but we haven't >done it, so there may be headaches involved we just don't know for sure. >However, you could split your input fasta into several chunks of roughly >equal size and fire up a different EC2 node for each fasta file, then >allow maker to use MPI to optimize parallelization on each node >individually. MAKER is really good at restarting if things fail, so with >that in mind I'd suggest staring spot nodes which can be 10X cheaper than >regularly priced nodes. Amazon will kill a spot node as soon as someone >comes along who is willing to pay full price, so you'd want a way (either >manually checking and restarting nodes or scripting a AWS API solution) >to check whether nodes finished and restart them if they did not, but you >could save a lot of money by doing this. > >B > >On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote: > >> Dear Maker developers, >> >> I?ve already contacted you a while ago about my annotation of the birch >>genome (Betula nana). As I am constantly running into problems using our >>cluster facilities at QMUL I thought of moving into the cloud. As I am >>rather inexperienced in cloud computing I have several questions: >> >> 1. To me it seems that there are two different Maker images on >>EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? >> 2. Can I use this Maker AMI for the annotation of a whole genome >>or is it only suitable for the tutorial tasks? >> 3. Also, when I followed the steps outlined in the tutorial, >>there seemed to be a problem with RepeatMasker. Although Maker would run >>and produce output files, the log file stated that the contig had failed >>after the second attempt. I launched the image on a T1.micro instance, >>maybe that wasn?t enough computing power? Or do you have another >>explanation for this? >> 4. Would it be possible to run the annotation in parallel (e.g. >>using MPICH2) in the cloud? I?ve also recently heard about a >>parallelisation module for use in the cloud developed by Era7, called >>?nispero?. But I am not sure whether it is publicly available yet. >> 5. Do you have any experience of how long an annotation task in >>the cloud would take and also what the expected costs would be? The >>birch genome is only 500 MB in size and currently I am simply annotating >>it with a SNAP trained HMM. However, in the future I will feed it with >>RNAseq data as well. >> >> Many thanks in advance and kind regards, >> Jasmin >> >> >> ----------------------------- >> Jasmin Zohren >> PhD student in the INTERCROSSING ITN >> Queen Mary University of London >> >> intercrossing.wikispaces.com >> evolve.sbcs.qmul.ac.uk >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >Barry Moore >Research Scientist >Dept. of Human Genetics >University of Utah >Salt Lake City, UT 84112 >-------------------------------------------- >(801) 585-3543 > > > > >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: >nts/20130905/bf35206e/attachment-0001.html> > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 64, Issue 4 >****************************************** From bhall7 at hawaii.edu Wed Sep 11 15:23:28 2013 From: bhall7 at hawaii.edu (Brian Hall) Date: Wed, 11 Sep 2013 10:23:28 -1000 Subject: [maker-devel] Question about phase for CDS with start codon Message-ID: <5230D140.7010804@hawaii.edu> Aloha, I'm working with a gff produced by maker. (I didn't run the program myself, but I believe it was version 2.24.) Here are the lines in question: scaffold00033 maker CDS 729494 729949 . - 2 ID=107343;Name=BDOR_005037-RC:cds:250;Parent=107334 scaffold00033 maker start_codon 729947 729949 . - . ID=107349;Name=BDOR_005037-RB:start1;Parent=107334 If I understand correctly, the start codon in this reverse-strand CDS is from position 729949 to 729947 -- the first three bases in the CDS. However, the phase value for the CDS is 2, which essentially skips the start codon. Downstream software (tbl2asn) is kicking up a "missing start codon" error. I have several hundred such issues in the gff for a single genome. They generally only occur on reverse-strand CDSs. Any ideas? Sincerest apologies if this is a duplicate question or if I've provided incomplete information. I am new at this. Thanks for your help! --Brian From ckuanglim at gmail.com Thu Sep 12 00:42:38 2013 From: ckuanglim at gmail.com (Chan Kuang Lim) Date: Thu, 12 Sep 2013 13:42:38 +0800 Subject: [maker-devel] Exon Type in MAKER GFF Output Message-ID: Dear Maker developers, I have a question regarding the GFF output of MAKER. When we look at CDS and Exon, we do not know whether they are initial, internal, terminal or single. How can we capture the exon type from MAKER output? Thanks, Chan KL -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 12 09:21:48 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Sep 2013 10:21:48 -0400 Subject: [maker-devel] Exon Type in MAKER GFF Output In-Reply-To: Message-ID: That information is not explicit in GFF3 format. You have to capture all exons parented onto the mRNA, then sort them to identify if the exon is 5-prime, 3-prime, internal, or single exon. --Carson From: Chan Kuang Lim Date: Thursday, September 12, 2013 1:42 AM To: Subject: [maker-devel] Exon Type in MAKER GFF Output Dear Maker developers, I have a question regarding the GFF output of MAKER. When we look at CDS and Exon, we do not know whether they are initial, internal, terminal or single. How can we capture the exon type from MAKER output? Thanks, Chan KL _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 12 10:27:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Sep 2013 11:27:44 -0400 Subject: [maker-devel] Question about phase for CDS with start codon In-Reply-To: <5230D140.7010804@hawaii.edu> Message-ID: I know there was an incorrect phase issue on a previous maker version that is now fixed, but I really doubt that is the issue causing your error. What are you using to convert from GFF3 to tbl format before using tbl2asn? I'd start there. we can send you a GFF3 to tbl converter if that will help. --Carson On 9/11/13 4:23 PM, "Brian Hall" wrote: >Aloha, > >I'm working with a gff produced by maker. (I didn't run the program >myself, but I believe it was version 2.24.) Here are the lines in >question: > >scaffold00033 maker CDS 729494 729949 . - 2 >ID=107343;Name=BDOR_005037-RC:cds:250;Parent=107334 >scaffold00033 maker start_codon 729947 729949 . - . >ID=107349;Name=BDOR_005037-RB:start1;Parent=107334 > >If I understand correctly, the start codon in this reverse-strand CDS is >from position 729949 to 729947 -- the first three bases in the CDS. >However, the phase value for the CDS is 2, which essentially skips the >start codon. Downstream software (tbl2asn) is kicking up a "missing >start codon" error. > >I have several hundred such issues in the gff for a single genome. They >generally only occur on reverse-strand CDSs. Any ideas? > >Sincerest apologies if this is a duplicate question or if I've provided >incomplete information. I am new at this. Thanks for your help! > >--Brian > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Fri Sep 13 03:15:29 2013 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Fri, 13 Sep 2013 10:15:29 +0200 Subject: [maker-devel] Maker pass-through behavior Message-ID: <5232C9A1.3060709@imbim.uu.se> Dear list, I have started using Maker to explore its use for a number of genome projects we are planning on running. One of the tools we intend on incorporating into our pipeline is PASA (Since we will be using Trinity etc). The (cleaned) output with predicted gene structures I would like to pass to Maker as pass-through annotation (I am optimistic that way...) - but I noticed that doing so does not always result in the incorporation of the PASA gene model into the final maker annotation track. Sometimes it seems to be superseded by an Augustus/Maker model, sometimes the region stays empty (even tho a protein alignment is present). So my question is how Maker handles pass-throughs, exactly. Can it reject pass-throughs, or should it always use such models over any other data source? Is there any scenario were it wouldn't? I understand that Maker uses some internal scoring system to estimate the accuracy of an annotation - could that be a reason? It would be a bit odd tho, since a lift-over from chicken (to our bird genome) seems to support gene models produced by PASA, yet they are nowhere to be found in the final models. And a related question: Is there a comprehensive documentation where I can get more information on the internal decision making process of Maker? Or do I have to dig into the code for that? Cheers, Marc PS I have attached a screenshot of such an example - the green track is Maker with proteins + augustus (chicken models) + PASA pass-through of a cleaned-up gene structure file. (Orange: Cleaned ORFs directly from PASA output, Grey: PASA ORFs without cleaning, Dark red: Maker with proteins and trinity transcripts as EST evidence, Black: chicken lift-overs from EnsEMBL) -------------- next part -------------- A non-text attachment was scrubbed... Name: igv_snapshot.png Type: image/png Size: 50142 bytes Desc: not available URL: From carsonhh at gmail.com Sun Sep 15 13:39:29 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 15 Sep 2013 14:39:29 -0400 Subject: [maker-devel] Maker pass-through behavior In-Reply-To: <5232C9A1.3060709@imbim.uu.se> Message-ID: > So my question is how Maker handles pass-throughs, exactly. Can it > reject pass-throughs, or should it always use such models over any other > data source? Is there any scenario were it wouldn't? pred_gff is treated the same as any other ab initio prediction. It is just one among several candidate gene models. The model that is kept is the one with the lowest AED score (lower means better evidence match/support). Any model with no evidence support or AED=1 will be rejected (no evidence support) unless keep_preds=1 is set. There is also another score eAED which takes into account protein reading frame (protein evidence must be in same reading frame as the gene model). If eAED =1 it will also cause models to be rejected. > I understand that Maker uses some internal scoring system to estimate > the accuracy of an annotation - could that be a reason? Possibly. Look at the AED score of the pass-through model in the final MAKER GFF3 to see what the AED score was. If you want to send me GFF3 to look at with a list of regions you are concerned about I can tell you more. Also consider giving PASA results to est_gff as well to bias the scoring algorithm to maintain those models (I.e. Model supports itself, which is reasonable since these are EST derived anyways and not just ab initio predictions). Also the model_gff option will always keep an input model (with or without evidence support) and will only replace it with something else if that something else has a better AED score. > > > > And a related question: Is there a comprehensive documentation where I > can get more information on the internal decision making process of > Maker? Or do I have to dig into the code for that? Look at these two papers --> Holt, C., and Yandell, M. (2011). MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491. Eilbeck, K., Moore, B., Holt, C., and Yandell, M. (2009). Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67. Thanks, Carson On 9/13/13 4:15 AM, "Marc P. Hoeppner" wrote: > Dear list, > > I have started using Maker to explore its use for a number of genome > projects we are planning on running. One of the tools we intend on > incorporating into our pipeline is PASA (Since we will be using Trinity > etc). The (cleaned) output with predicted gene structures I would like > to pass to Maker as pass-through annotation (I am optimistic that > way...) - but I noticed that doing so does not always result in the > incorporation of the PASA gene model into the final maker annotation > track. Sometimes it seems to be superseded by an Augustus/Maker model, > sometimes the region stays empty (even tho a protein alignment is present). > > So my question is how Maker handles pass-throughs, exactly. Can it > reject pass-throughs, or should it always use such models over any other > data source? Is there any scenario were it wouldn't? > > I understand that Maker uses some internal scoring system to estimate > the accuracy of an annotation - could that be a reason? It would be a > bit odd tho, since a lift-over from chicken (to our bird genome) seems > to support gene models produced by PASA, yet they are nowhere to be > found in the final models. > > And a related question: Is there a comprehensive documentation where I > can get more information on the internal decision making process of > Maker? Or do I have to dig into the code for that? > > Cheers, > > Marc > > PS I have attached a screenshot of such an example - the green track is > Maker with proteins + augustus (chicken models) + PASA pass-through of a > cleaned-up gene structure file. (Orange: Cleaned ORFs directly from PASA > output, Grey: PASA ORFs without cleaning, Dark red: Maker with proteins > and trinity transcripts as EST evidence, Black: chicken lift-overs from > EnsEMBL) > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhinsley at ebi.ac.uk Mon Sep 16 04:51:35 2013 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Mon, 16 Sep 2013 10:51:35 +0100 Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes In-Reply-To: <5232C9A1.3060709@imbim.uu.se> References: <5232C9A1.3060709@imbim.uu.se> Message-ID: <5236D4A7.6080303@ebi.ac.uk> Hello I'm trying to get maker to run on MPI using several nodes. I have an installation set up by a colleague which includes maker 2.27 and openmpi-1.4.3. Previously it has only been used (here at EBI) with maker processes running on one node only, but i find that it can wait a very long time before being scheduled by LSF. The command used to submit is like this (as per recommendations from systems) (uses 8 cpus on each of 8 nodes) |export OMP_NUM_THREADS=||64| |bsub -q mpi -M ||40000| |-R ||"rusage[mem=40000] && span[ptile=8]"| |-n ||64| |-o lsf_log -a openmpi mpirun.lsf -np ||64| |-mca btl tcp,self maker ||2||>&||1| and requires environment be set up in ~/.bashrc for openMPI. This runs but produces a lot of errors like: DBD::SQLite::db do failed: database is locked at /nfs/production/panda/ensemblgenomes/external/maker/2.27_mpi/maker/bin/../lib/GFFDB.pm line 407. I've looked at https://groups.google.com/forum/#!searchin/maker-devel/database$20locked/maker-devel/TscBgbQfBX4/pae016DqlIMJ which suggests that "It means that your GFF3 results will not be integrated" (but i'm not sure what's meant by that, but the number of genes i'm getting is around 2k, expect more like 15k) and that the problem is SQLite using NFS (a known issue), and the fix is to use /tmp. I have TMP= set as per default in maker_opts.ctl, and there are maker directories in /tmp on the runtime nodes, but the database (i guess) is in /nfs/...../maker//.scf.db. I don't see how i could set the working directory to a non-NFS file systems and still use more than one node, but this error only seems to appear (so far) with est2genome, not when running SNAP/ Augustus. Is there a work around to stop getting the locked error or some way to recover from it after maker has finished? Or is it necessary to run the est2genome step (or maker generally) on one node? An obvious option is to split the assembly but i was hoping to avoid that. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 17 22:35:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Sep 2013 21:35:52 -0600 Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes In-Reply-To: <5236D4A7.6080303@ebi.ac.uk> Message-ID: Sorry for the slow reply, I'm currently traveling. Try deleting any *.db diles in the maker output directory to force the SQLite database to be rebuilt. Also you can try the current version of MAKER at yandell-lab.org. MAKER is supposed to try and copy the database to the /tmp directory before it starts work. That way the actual working copy will be local, and will be independent for each node. --Carson From: Malcolm Hinsley Date: Monday, September 16, 2013 3:51 AM To: Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes Hello I'm trying to get maker to run on MPI using several nodes. I have an installation set up by a colleague which includes maker 2.27 and openmpi-1.4.3. Previously it has only been used (here at EBI) with maker processes running on one node only, but i find that it can wait a very long time before being scheduled by LSF. The command used to submit is like this (as per recommendations from systems) (uses 8 cpus on each of 8 nodes) export OMP_NUM_THREADS=64 bsub -q mpi -M 40000 -R "rusage[mem=40000] && span[ptile=8]" -n 64 -o lsf_log -a openmpi mpirun.lsf -np 64 -mca btl tcp,self maker 2>&1 and requires environment be set up in ~/.bashrc for openMPI. This runs but produces a lot of errors like: DBD::SQLite::db do failed: database is locked at /nfs/production/panda/ensemblgenomes/external/maker/2.27_mpi/maker/bin/../li b/GFFDB.pm line 407. I've looked at https://groups.google.com/forum/#!searchin/maker-devel/database$20locked/mak er-devel/TscBgbQfBX4/pae016DqlIMJ which suggests that "It means that your GFF3 results will not be integrated" (but i'm not sure what's meant by that, but the number of genes i'm getting is around 2k, expect more like 15k) and that the problem is SQLite using NFS (a known issue), and the fix is to use /tmp. I have TMP= set as per default in maker_opts.ctl, and there are maker directories in /tmp on the runtime nodes, but the database (i guess) is in /nfs/...../maker//.scf.db. I don't see how i could set the working directory to a non-NFS file systems and still use more than one node, but this error only seems to appear (so far) with est2genome, not when running SNAP/ Augustus. Is there a work around to stop getting the locked error or some way to recover from it after maker has finished? Or is it necessary to run the est2genome step (or maker generally) on one node? An obvious option is to split the assembly but i was hoping to avoid that. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 17 22:57:12 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Sep 2013 21:57:12 -0600 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: It does sound like this is likely the result of gene fusion from the trinity assemblies. One thing to look at is the number of coding exons compared to the other ant species. See if the increase in exons is mostly in UTR, coding sequence, or both. One thing you could try is running MAKER without the EST evidence, just to see how many genes you get with protein only support. There are ways to use multiple MAKER runs to tease out details of the data. For example: run1: protein evidence only plus ab initio predators like snap and augustus. run2: protein and EST evidence. Models from run1 passed in as pred_gff with snap and augustus turned off (this will force the addition of UTR, but not the generation of new models). Use the correct_est_fusion=1 option here to clip UTR that runs into neighboring genes. run3: protein and EST evidence plus augusuts and snap. Then take models fromrun2 and models from run3 that do not overlap run2 and add them all to your final set along with any models that come from interproscan domain analysis of rejected models. This solution is rather lengthy, but may avoid many of the problems you seem to be getting with gene merging even with jaccard_clip and correct_est_fusion turned on, because your ESTs would only contribute to the UTR and to models not found based solely on protein evidence (I.e. They would be ignored in cases where you get enough evidence from other sources). --Carson From: Benjamin Rubin Date: Tuesday, September 17, 2013 10:08 AM To: Carson Holt Subject: Re: [maker-devel] Unexpected results with correct_est_fusion Hi Carson, The new version is working great. Thanks for your help. I do have another more general question. I am working on annotating a new ant genome (Pseudomyrmex gracilis) and the results that I am getting from MAKER are a bit unexpected. The number of genes produced by MAKER is ~14,300 while, as you may know, the seven published ant genomes have at least 16,000 genes (this number was improved by several hundred by turning on correct_est_fusion). Running the ab initio predictions through InterProScan yields ~900 additional genes for P. gracilis so there are still substantially fewer genes found for this species. This difference on its own is not that unexpected; Pseudomyrmex likely diverged from the other sequenced ants by over 100 million years and the genome sequence itself is rather fragmented and incomplete. However, what is bothering me is that, despite having fewer genes, I am seeing substantially larger numbers of exons (~92,000 as opposed to 78-85,000) and the total length of all proteins is more than a million amino acids longer in P. gracilis. It does not have unexpectedly long genes, the average gene length is just a bit higher. I have looked at the annotations of some conserved genes and found some apparently spurious exons merged with these genes. I say that they are spurious because they go beyond the end of the gene sequence in other species (ants and Drosophila). Unfortunately, it appears that many of these spurious calls are primarily the result of blast hits to my EST data. The ESTs generally seem to blast to the genome a bit more often than expected. Partly as a result of the relatively high repeat content of my genome (~50% complex repeats) and partly because we only used two Illumina libraries, my genome sequence is quite fragmented (~280Mb in ~6,500 scaffolds). Note that the total genome length is estimated at 387Mb, so I am missing a fair amount but almost all CEGMA genes are present in the assembly so I have concluded that the missing sequence is predominantly repeats. I have no prior reason to expect that my EST library has anything wrong with it. I did a single Illumina lane of RNA-seq and assembled in Trinity with the jaccard_clip option on to reduce gene fusions. If you have any advice on how my gene predictions can be improved, I would really appreciate it. Have you heard of this kind of problem before? Is there a way to limit the influence of ESTs without discarding them entirely? Thanks so much for your help with the fusion bug and for any advice here. Ben On Wed, Sep 11, 2013 at 9:27 AM, Benjamin Rubin wrote: > Hi Carson, > > OK, I will try it and let you know how it goes. And thanks for the suggestion > about using always_complete as well. > > Thanks! > Ben > > > On Tue, Sep 10, 2013 at 9:45 PM, Carson Holt wrote: >> I think I have it fixed. Sorry it took so long, but my original fix actually >> created other odd behaviors so I had to track those down as well. >> >> You can download the test version with the fix by typing this on the command >> line --> >> >> svn co ********* >> >> user: ***** >> password: ***** >> >> Test it out and let me know. On the contig you sent me, I also set >> always_complete=1 as some of the hint based models were lacking start or stop >> codons. The results looked slightly better that way as well. >> >> Thanks, >> Carson >> >> >> >> From: Benjamin Rubin >> Date: Wednesday, September 4, 2013 10:07 AM >> To: Carson Holt >> >> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >> >> OK, great. Thanks for letting me know. >> >> Ben >> >> >> On Wed, Sep 4, 2013 at 9:00 AM, Carson Holt wrote: >>> I thought I'd give you an update on this. I've verified the bug and think >>> I've identified roughly where it's happening. I'll have a fix for you to >>> test soon. >>> >>> --Carson >>> >>> >>> From: Benjamin Rubin >>> >>> Date: Wednesday, August 28, 2013 4:16 PM >>> To: Carson Holt >>> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >>> >>> Hi Carson, >>> >>> OK, I think I uploaded all of the necessary files. I made a directory named >>> "rubin_data" for everything. I included both the full genome file >>> ("ec_patch...") as well as a file for scaffold_1. For this scaffold, I get >>> 132 genes when correct_est_fusion is off and 35 when it is on. These results >>> are after running maker a first time with correct_est_fusion on and >>> retraining SNAP/Augustus on the results. The SNAP file is >>> "gracilis_round_1.hmm" and I think the necessary Augustus files are in the >>> "gracilis_jaccard_flank100_corrfusion_round_1_results" directory. I also >>> included gff files for scaffold_1 with and without correct_est_fusion turned >>> on. >>> >>> Let me know if there is anything else that I failed to upload. I really >>> appreciate your time. Thanks so much. >>> >>> Ben >>> >>> >>> On Wed, Aug 28, 2013 at 9:59 AM, Benjamin Rubin >>> wrote: >>>> Hi Carson, >>>> >>>> Yes, I would be happy to upload the necessary data. Just let me know the >>>> connection information. >>>> >>>> Thanks! >>>> Ben >>>> >>>> >>>> On Wed, Aug 28, 2013 at 8:09 AM, Carson Holt wrote: >>>>> Could you pick one contig where the number of genes shift dramatically and >>>>> upload that contig fasta together with your control files and any evidence >>>>> datasets used to one of our servers (I'm going to send you connection >>>>> details in a separate e-mail). I can then run with and without >>>>> correct_est_fusion to see if there is anything unexpected going on. >>>>> >>>>> --Carson >>>>> >>>>> >>>>> >>>>> From: Benjamin Rubin >>>>> Date: Tuesday, August 27, 2013 10:59 AM >>>>> To: Carson Holt >>>>> Cc: >>>>> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >>>>> >>>>> Hi Carson, >>>>> >>>>> I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but >>>>> I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank >>>>> at 100). This is using only the first round with SNAP and Augustus trained >>>>> on the CEGMA genes. Is there anything else that I might be doing wrong? I >>>>> have attached my control file in case that could be useful. >>>>> >>>>> Thanks for the help! >>>>> Ben >>>>> >>>>> >>>>> On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: >>>>>> The correct_est_fusion option just clips UTR on overlapping genes. I >>>>>> suspect the real problem is setting pred_flank too low. If your lead in >>>>>> sequence to a gene is too short, ab initio predictors won't call it. So >>>>>> you are probably getting empty reports from SNAP/Augustus for the hint >>>>>> based predictions. Try increasing pred_flank to at least 150. Setting >>>>>> pred_flank too low will also limit how far MAKER will walk out along the >>>>>> edges initial alignments during the polishing step (exonerate). So >>>>>> setting it too low may also be causing you to lose some EST and protein >>>>>> alignments. >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> From: Benjamin Rubin >>>>>> Date: Monday, August 26, 2013 2:20 PM >>>>>> To: >>>>>> Subject: [maker-devel] Unexpected results with correct_est_fusion >>>>>> >>>>>> Hello developers, >>>>>> >>>>>> I am using MAKER 2.28 to annotate an ant genome. I provide protein >>>>>> sequence evidence from all seven of the other sequenced ant genomes and a >>>>>> de novo assembled transcriptome as EST evidence. I assembled the >>>>>> transcriptome using Trinity with the jaccard_clip option turned on to >>>>>> reduce gene fusions. Despite using this set of hopefully non-fused ESTs, >>>>>> I still have substantial fusion problems with the final annotation. >>>>>> Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. >>>>>> However, correct_est_fusion leads to the prediction of a much smaller >>>>>> number of genes (~5,000 instead of ~14,000). I am initially training both >>>>>> SNAP and Augustus using CEGMA genes and then retraining based on the >>>>>> first round of annotation. Both rounds of annotation yield the same low >>>>>> number (~5,000) of genes. It may also be worth mentioning that the number >>>>>> of exons is also far lower when using correct_est_fusion (~26,000 instead >>>>>> of ~90,000). >>>>>> >>>>>> Is this the expected behavior of correct_est_fusion? I was surprised that >>>>>> it reduced the predicted number of genes by such a large margin. I am >>>>>> concerned that I am using it incorrectly. Do you have any other >>>>>> suggestions for reducing gene merging? >>>>>> >>>>>> Thanks, >>>>>> Ben >>>>>> >>>>>> -- >>>>>> _____________________________________________________ >>>>>> Benjamin ER Rubin >>>>>> PhD Candidate >>>>>> Committee on Evolutionary Biology >>>>>> University of Chicago >>>>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>>>> >>>>>> Division of Insects >>>>>> Zoology Department >>>>>> Field Museum of Natural History >>>>>> 1400 South Lake Shore Drive >>>>>> Chicago, IL 60605 >>>>>> USA >>>>>> Office: (312) 665-7776 >>>>>> _______________________________________________ maker-devel mailing list >>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinf >>>>>> o/maker-devel_yandell-lab.org >>>>> >>>>> >>>>> >>>>> -- >>>>> _____________________________________________________ >>>>> Benjamin ER Rubin >>>>> PhD Candidate >>>>> Committee on Evolutionary Biology >>>>> University of Chicago >>>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>>> >>>>> Division of Insects >>>>> Zoology Department >>>>> Field Museum of Natural History >>>>> 1400 South Lake Shore Drive >>>>> Chicago, IL 60605 >>>>> USA >>>>> Office: (312) 665-7776 >>>> >>>> >>>> >>>> -- >>>> _____________________________________________________ >>>> Benjamin ER Rubin >>>> PhD Candidate >>>> Committee on Evolutionary Biology >>>> University of Chicago >>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>> >>>> Division of Insects >>>> Zoology Department >>>> Field Museum of Natural History >>>> 1400 South Lake Shore Drive >>>> Chicago, IL 60605 >>>> USA >>>> Office: (312) 665-7776 >>> >>> >>> >>> -- >>> _____________________________________________________ >>> Benjamin ER Rubin >>> PhD Candidate >>> Committee on Evolutionary Biology >>> University of Chicago >>> http://www.moreaulab.org/Benjamin_Rubin.html >>> >>> Division of Insects >>> Zoology Department >>> Field Museum of Natural History >>> 1400 South Lake Shore Drive >>> Chicago, IL 60605 >>> USA >>> Office: (312) 665-7776 >> >> >> >> -- >> _____________________________________________________ >> Benjamin ER Rubin >> PhD Candidate >> Committee on Evolutionary Biology >> University of Chicago >> benrubin.org >> >> Division of Insects >> Zoology Department >> Field Museum of Natural History >> 1400 South Lake Shore Drive >> Chicago, IL 60605 >> USA >> Office: (312) 665-7776 > > > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > benrubin.org > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From leshin at gmail.com Wed Sep 18 14:35:10 2013 From: leshin at gmail.com (Le-Shin Wu) Date: Wed, 18 Sep 2013 15:35:10 -0400 Subject: [maker-devel] running mpi MAKER Message-ID: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Hi, I am new to MAKER and just started to use MAKER for doing some genome annotations. I compiled MAKER package with mpi-supported configuration on our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER job, I got whole bunch of warring message as shown below in my error log file. I wonder is there anything wrong with this warring message? Thank you. (I request 64 processors on two nodes) STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. Best LW From carsonhh at gmail.com Wed Sep 18 15:27:32 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Sep 2013 14:27:32 -0600 Subject: [maker-devel] running mpi MAKER In-Reply-To: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Message-ID: It means either maker as not properly configured for MPI support, or the communication ring is not launching properly. Three things: 1. In the .../maker/src/ directory, run './Build status'. Does it say MPI_SUPPORT is configured or installed? 2. Run 'which mpiexec' on the command line? What is the path? Is is MPICH2 mpiexec, or OpenMPI, or something else? 3. Run 'mpiexec -n 64 -hostfile $PBS_NODEFILE hostname' on the command line. What does it print out? Thanks, Carson On 9/18/13 1:35 PM, "Le-Shin Wu" wrote: >Hi, > >I am new to MAKER and just started to use MAKER for doing some genome >annotations. I compiled MAKER package with mpi-supported configuration on >our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker >maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER >job, I got whole bunch of warring message as shown below in my error log >file. I wonder is there anything wrong with this warring message? Thank >you. (I request 64 processors on two nodes) > >STATUS: Processing and indexing input FASTA files... >WARNING: Multiple MAKER processes have been started in the >same directory. > > >Best > >LW >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From lewu at indiana.edu Wed Sep 18 20:30:49 2013 From: lewu at indiana.edu (Le-shin Wu) Date: Wed, 18 Sep 2013 21:30:49 -0400 Subject: [maker-devel] running mpi MAKER In-Reply-To: References: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Message-ID: Hi Carson, Thanks a lot for your information. When I run './Build status', it shows as below and looks like MPI SUPPORT is enabled. ============================================================================== STATUS MAKER 2.27 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: ENABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK But when I run 'which mpiexec' it shows "/N/soft/mason/openmpi/1.5.4/gcc/bin/mpiexec". So I think I did not use the correct version of mpiexec while running my MAKER job. Thanks again. I will try my MAKER job again with the correct mpiexec from mpich2. Best LW ____________________________________________ Le-Shin Wu Center for Computational Cytomics, Indiana University http://www.cs.indiana.edu/~lewu ____________________________________________ On Wed, Sep 18, 2013 at 4:27 PM, Carson Holt wrote: > It means either maker as not properly configured for MPI support, or the > communication ring is not launching properly. > > Three things: > 1. In the .../maker/src/ directory, run './Build status'. Does it say > MPI_SUPPORT is configured or installed? > 2. Run 'which mpiexec' on the command line? What is the path? Is is > MPICH2 mpiexec, or OpenMPI, or something else? > 3. Run 'mpiexec -n 64 -hostfile $PBS_NODEFILE hostname' on the command > line. What does it print out? > > Thanks, > Carson > > > On 9/18/13 1:35 PM, "Le-Shin Wu" wrote: > > >Hi, > > > >I am new to MAKER and just started to use MAKER for doing some genome > >annotations. I compiled MAKER package with mpi-supported configuration on > >our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker > >maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER > >job, I got whole bunch of warring message as shown below in my error log > >file. I wonder is there anything wrong with this warring message? Thank > >you. (I request 64 processors on two nodes) > > > >STATUS: Processing and indexing input FASTA files... > >WARNING: Multiple MAKER processes have been started in the > >same directory. > > > > > >Best > > > >LW > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhinsley at ebi.ac.uk Thu Sep 19 10:37:17 2013 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Thu, 19 Sep 2013 16:37:17 +0100 Subject: [maker-devel] 2.27 and 2.28 incompatible Message-ID: <523B1A2D.7020300@ebi.ac.uk> To try to fix SQL lock file errors I installed 2.28 and made the mistake of running on a directory made by 2.27 (to run snap and augustus for the first time). Every contig fails due to errors like: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't open file 04723bc0c22478764d90bbaebca96d23 STACK: Error::throw STACK: Bio::Root::Root::throw /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/Root/Root.pm:472 STACK: Bio::DB::Fasta::fh /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:948 STACK: Bio::DB::Fasta::subseq /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:929 STACK: Bio::PrimarySeq::Fasta::seq /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:1089 STACK: FastaSeq::seq /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/FastaSeq.pm:50 STACK: Process::MpiChunk::_go /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:478 STACK: Process::MpiChunk::run /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiTiers.pm:286 STACK: /nfs/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/maker:667 ----------------------------------------------------------- --> rank=NA, hostname=ebi3-198.ebi.ac.uk at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Error.pm line 38 Error::_throw_Error_Simple('HASH(0x388cb78)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../ lib/Error.pm line 306 Error::subs::run_clauses('HASH(0x388cbf0)', '\x{a}------------- EXCEPTION: Bio::Root::Exception -------------\x{a}...', undef, 'ARRAY(0x38a0d18)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x38f93f8)', 'HASH(0x388cbf0)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich 2/bin/../lib/FastaSeq.pm line 95 FastaSeq::seq('FastaSeq=HASH(0x388dda0)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/ Process/MpiChunk.pm line 478 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x38a0e50)', 'run', 'HASH(0x38a0ec8)', 0, 0) called at /nfs/production/panda/ens emblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run('Process::MpiChunk=HASH(0x38a0e50)', 0) called at /nfs/production/panda/ensemblgenomes/external/maker/2. 28_mpich2/bin/../lib/Process/MpiChunk.pm line 357 Process::MpiChunk::run_all('Process::MpiChunk=HASH(0x38a0e50)', 0) called at /nfs/production/panda/ensemblgenomes/external/make r/2.28_mpich2/bin/../lib/Process/MpiTiers.pm line 286 Process::MpiTiers::run_all('Process::MpiTiers=HASH(0x3867960)', 0) called at /nfs/panda/ensemblgenomes/external/maker/2.28_mpic h2/bin/maker line 667 Is there an easy to reset the datastore/ file names so that i can switch over to 2.28 without starting over? (eg maker -dsindex) I killed the job and ran 2.27 instead which seems to be jim dandy. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom From carsonhh at gmail.com Thu Sep 19 11:06:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Sep 2013 10:06:09 -0600 Subject: [maker-devel] 2.27 and 2.28 incompatible In-Reply-To: <523B1A2D.7020300@ebi.ac.uk> Message-ID: There is something very odd, because I've never seen those errors before, and 2.28 should use the same datastore structure as 2.27. I'm going to write a script that will print out certain configuration information about your install that might help me see what's going on. My plane is boarding now, so I'll send it to you later this evening. Thanks, Carson On 9/19/13 9:37 AM, "Malcolm Hinsley" wrote: >To try to fix SQL lock file errors I installed 2.28 and made the mistake >of running on a directory made by 2.27 (to run snap and augustus for the >first time). > >Every contig fails due to errors like: > > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Can't open file 04723bc0c22478764d90bbaebca96d23 >STACK: Error::throw >STACK: Bio::Root::Root::throw >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/Root/Root. >pm:472 >STACK: Bio::DB::Fasta::fh >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:948 >STACK: Bio::DB::Fasta::subseq >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:929 >STACK: Bio::PrimarySeq::Fasta::seq >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:1089 >STACK: FastaSeq::seq >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/FastaSeq.pm:50 >STACK: Process::MpiChunk::_go >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:478 >STACK: Process::MpiChunk::run >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:341 >STACK: Process::MpiChunk::run_all >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:357 >STACK: Process::MpiTiers::run_all >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiTiers.pm:286 >STACK: /nfs/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/maker:667 >----------------------------------------------------------- >--> rank=NA, hostname=ebi3-198.ebi.ac.uk > at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Error.pm >line 38 > Error::_throw_Error_Simple('HASH(0x388cb78)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../ >lib/Error.pm line 306 > Error::subs::run_clauses('HASH(0x388cbf0)', '\x{a}------------- >EXCEPTION: Bio::Root::Exception -------------\x{a}...', undef, >'ARRAY(0x38a0d18)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Error.pm >line 426 > Error::subs::try('CODE(0x38f93f8)', 'HASH(0x388cbf0)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich >2/bin/../lib/FastaSeq.pm line 95 > FastaSeq::seq('FastaSeq=HASH(0x388dda0)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/ >Process/MpiChunk.pm line 478 > Process::MpiChunk::_go('Process::MpiChunk=HASH(0x38a0e50)', >'run', 'HASH(0x38a0ec8)', 0, 0) called at /nfs/production/panda/ens >emblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm >line 341 > Process::MpiChunk::run('Process::MpiChunk=HASH(0x38a0e50)', 0) >called at /nfs/production/panda/ensemblgenomes/external/maker/2. >28_mpich2/bin/../lib/Process/MpiChunk.pm line 357 > Process::MpiChunk::run_all('Process::MpiChunk=HASH(0x38a0e50)', >0) called at /nfs/production/panda/ensemblgenomes/external/make >r/2.28_mpich2/bin/../lib/Process/MpiTiers.pm line 286 > Process::MpiTiers::run_all('Process::MpiTiers=HASH(0x3867960)', >0) called at /nfs/panda/ensemblgenomes/external/maker/2.28_mpic >h2/bin/maker line 667 > >Is there an easy to reset the datastore/ file names so that i can switch >over to 2.28 without starting over? (eg maker -dsindex) >I killed the job and ran 2.27 instead which seems to be jim dandy. > >-- >malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 >European Bioinformatics Institute (EMBL-EBI) >European Molecular Biology Laboratory >Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD >United Kingdom > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From myandell at genetics.utah.edu Thu Sep 19 12:53:48 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Thu, 19 Sep 2013 17:53:48 +0000 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: References: Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Hi Corban & Xia, I've forwarded your question along to the MAKER_dev list, were you can get speedy answers to your maker related questions. Thanks for using MAKER. --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] Sent: Thursday, September 19, 2013 11:49 AM To: Mark Yandell; Corban-Gregory.Rivera at dupont.com Subject: maker2 scripts for functional annotation Dr. Yandell, We were recently evaluating maker2 for annotation and going through the maker tutorial from 2012. http://gmod.org/wiki/MAKER_Tutorial_2012 The tutorial makes references to some scripts that we couldn?t find in the current release. We were looking for scripts like gff3_preds2models to convert match/match_part format into annotations with gene/mRNA/exons/CDS and others. I was wondering if maybe we did not have the most up to date version. In addition to getting accurate gene annotations, I was looking for a solution to get functional assignments. I see that there are some scripts like maker_functional_fasta that may help, but I was wondering what you would recommend. Thanks, Corban & Xia This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. The dupont.com web address will continue in use for a transitional period for communications sent or received on behalf of DuPont Performance Coatings., which is not affiliated in any way with the DuPont Company. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From carsonhh at gmail.com Thu Sep 19 16:58:16 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Sep 2013 15:58:16 -0600 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Message-ID: Hello Corban & Xia, Some scripts like gff3_preds2models are deprecated. To get the same result as was offered by gff3_preds2models, just give your match/match_part features to pref_gff= in the maker_opts.ctl file, set keep_preds=1, and run with all other options and predictors turned off. The final MAKER result will be your match/match_part features converted into gene/mRNA/exons/CDS. For functional annotation, you can use Interproscan, BLASTP against UniProt, or BALST2GO. My preference is to use InterProScan to add GO terms and proteins domains via the ipr_update_gff and iprscan2gff3 scripts. Then add putative gene functions via BLASTP to UniProt and maker_functional_fasta and maker_functional_gff scripts. Go ahead and take a look and that those tools and let me know if you have any questions or need help you configuring them. Thanks, Carson On 9/19/13 11:53 AM, "Mark Yandell" wrote: >Hi Corban & Xia, > > >I've forwarded your question along to the MAKER_dev list, were you can >get speedy answers to your maker related questions. Thanks for using >MAKER. > >--mark > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >Sent: Thursday, September 19, 2013 11:49 AM >To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >Subject: maker2 scripts for functional annotation > >Dr. Yandell, > >We were recently evaluating maker2 for annotation and going through the >maker tutorial from 2012. > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >The tutorial makes references to some scripts that we couldn?t find in >the current release. We were looking for scripts like gff3_preds2models >to convert match/match_part format into annotations with >gene/mRNA/exons/CDS and others. I was wondering if maybe we did not have >the most up to date version. > >In addition to getting accurate gene annotations, I was looking for a >solution to get functional assignments. I see that there are some >scripts like maker_functional_fasta that may help, but I was wondering >what you would recommend. > >Thanks, > >Corban & Xia > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender by >return e-mail and delete this e-mail from your system. Unless explicitly >and conspicuously designated as "E-Contract Intended", this e-mail does >not constitute a contract offer, a contract amendment, or an acceptance >of a contract offer. This e-mail does not constitute a consent to the >use of sender's contact information for direct marketing purposes or for >transfers of data to third parties. > >The dupont.com web address will continue in use for a >transitional period for communications sent or received on behalf of >DuPont >Performance Coatings., which is not affiliated in any way with the DuPont >Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean > > http://www.DuPont.com/corp/email_disclaimer.html > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From graham.etherington at sainsbury-laboratory.ac.uk Wed Sep 25 06:49:40 2013 From: graham.etherington at sainsbury-laboratory.ac.uk (graham etherington (TSL)) Date: Wed, 25 Sep 2013 11:49:40 +0000 Subject: [maker-devel] Path and contents of RepBase Message-ID: Hi, I'm getting the following error when I run maker v2.28: WARNING: RepBase is not installed for RepeatMasker. This limits RepeatMasker's functionality and makes the model_org option in the control files virtually meaningless. MAKER will now reconfigure for simple repeat masking only. In maker_opts.clt I have: model_org=all In maker_exe.ctl I have: RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker Instructions in the GMOD maker tutorial state: "Unpack the contents of the RepBase tarball into the RepeatMasker/Libraries directory." So, I have RepBase located as follows: /RepeatMasker/4.0.3/x86_64/bin/Libraries/ The content of this directory is: RepBase18.08.embl/ RepBase18.08.fasta/ Could someone tell me how/where maker looks for REPBase and which files (embl? fasta? something else?) I need in there? Many thanks for your help, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 From carsonhh at gmail.com Wed Sep 25 09:13:40 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:13:40 -0400 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just letting you know verbally that you don't have it installed, so you are not surprised by the lack of results RepeatMasker gives you. You must Download RepBase separately from Repeatmasker. When you unpack it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file as well as other files in the .../RepeatMasker/Libraries/ directory. The header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will tell you if it is the minimal library or the full RepBase library. You have also downloaded the incorrect format since you have directories named RepBase18.08.embl. You need to go to http://www.girinst.org/server/RepBase/index.php and download the RepeatMasker edition and not the EMBL format one. The contents should be named exactly .../Libraries/RepeatMaskerLib.embl. Here is a direct link --> http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repea tmaskerlibraries-20130422.tar.gz Make sure you are in the .../RepeatMasker/ directory before unpacking the tar ball, or you won't get the proper file replacement behavior. See Repeatmasker installation instructions here --> http://www.repeatmasker.org/RMDownload.html Thanks, Carson On 9/25/13 7:49 AM, "graham etherington (TSL)" wrote: >Hi, >I'm getting the following error when I run maker v2.28: >WARNING: RepBase is not installed for RepeatMasker. This limits >RepeatMasker's functionality and makes the model_org option in the >control files virtually meaningless. MAKER will now reconfigure >for simple repeat masking only. > > > >In maker_opts.clt I have: >model_org=all >In maker_exe.ctl I have: >RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker > >Instructions in the GMOD maker tutorial state: >"Unpack the contents of the RepBase tarball into the >RepeatMasker/Libraries directory." > > >So, I have RepBase located as follows: > >/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >The content of this directory is: >RepBase18.08.embl/ >RepBase18.08.fasta/ > >Could someone tell me how/where maker looks for REPBase and which files >(embl? fasta? something else?) I need in there? > >Many thanks for your help, >Graham > > >Dr. Graham Etherington >Bioinformatics Support Officer, >The Sainsbury Laboratory, >Norwich Research Park, >Norwich NR4 7UH. >UK >Tel: +44 (0)1603 450601 > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From graham.etherington at sainsbury-laboratory.ac.uk Wed Sep 25 09:29:53 2013 From: graham.etherington at sainsbury-laboratory.ac.uk (graham etherington (TSL)) Date: Wed, 25 Sep 2013 14:29:53 +0000 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: Hi Carson, Many thanks for the explanation of how RepBase works. I followed your instructions and maker no longer complains. Thanks for your help, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 On 25/09/2013 15:13, "Carson Holt" wrote: >It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just >letting you know verbally that you don't have it installed, so you are not >surprised by the lack of results RepeatMasker gives you. > >You must Download RepBase separately from Repeatmasker. When you unpack >it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file >as well as other files in the .../RepeatMasker/Libraries/ directory. The >header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will >tell you if it is the minimal library or the full RepBase library. > >You have also downloaded the incorrect format since you have directories >named RepBase18.08.embl. You need to go to >http://www.girinst.org/server/RepBase/index.php and download the >RepeatMasker edition and not the EMBL format one. The contents should be >named exactly .../Libraries/RepeatMaskerLib.embl. > >Here is a direct link --> >http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repe >a >tmaskerlibraries-20130422.tar.gz > > >Make sure you are in the .../RepeatMasker/ directory before unpacking the >tar ball, or you won't get the proper file replacement behavior. > >See Repeatmasker installation instructions here --> >http://www.repeatmasker.org/RMDownload.html > >Thanks, >Carson > > > >On 9/25/13 7:49 AM, "graham etherington (TSL)" > wrote: > >>Hi, >>I'm getting the following error when I run maker v2.28: >>WARNING: RepBase is not installed for RepeatMasker. This limits >>RepeatMasker's functionality and makes the model_org option in the >>control files virtually meaningless. MAKER will now reconfigure >>for simple repeat masking only. >> >> >> >>In maker_opts.clt I have: >>model_org=all >>In maker_exe.ctl I have: >>RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker >> >>Instructions in the GMOD maker tutorial state: >>"Unpack the contents of the RepBase tarball into the >>RepeatMasker/Libraries directory." >> >> >>So, I have RepBase located as follows: >> >>/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >>The content of this directory is: >>RepBase18.08.embl/ >>RepBase18.08.fasta/ >> >>Could someone tell me how/where maker looks for REPBase and which files >>(embl? fasta? something else?) I need in there? >> >>Many thanks for your help, >>Graham >> >> >>Dr. Graham Etherington >>Bioinformatics Support Officer, >>The Sainsbury Laboratory, >>Norwich Research Park, >>Norwich NR4 7UH. >>UK >>Tel: +44 (0)1603 450601 >> >> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Wed Sep 25 09:32:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:32:33 -0400 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: Glad it worked. If you have any other question, just let us know. Thanks, Carson On 9/25/13 10:29 AM, "graham etherington (TSL)" wrote: >Hi Carson, >Many thanks for the explanation of how RepBase works. I followed your >instructions and maker no longer complains. >Thanks for your help, >Graham > >Dr. Graham Etherington >Bioinformatics Support Officer, >The Sainsbury Laboratory, >Norwich Research Park, >Norwich NR4 7UH. >UK >Tel: +44 (0)1603 450601 > > > > > >On 25/09/2013 15:13, "Carson Holt" wrote: > >>It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just >>letting you know verbally that you don't have it installed, so you are >>not >>surprised by the lack of results RepeatMasker gives you. >> >>You must Download RepBase separately from Repeatmasker. When you unpack >>it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file >>as well as other files in the .../RepeatMasker/Libraries/ directory. The >>header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will >>tell you if it is the minimal library or the full RepBase library. >> >>You have also downloaded the incorrect format since you have directories >>named RepBase18.08.embl. You need to go to >>http://www.girinst.org/server/RepBase/index.php and download the >>RepeatMasker edition and not the EMBL format one. The contents should be >>named exactly .../Libraries/RepeatMaskerLib.embl. >> >>Here is a direct link --> >>http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/rep >>e >>a >>tmaskerlibraries-20130422.tar.gz >> >> >>Make sure you are in the .../RepeatMasker/ directory before unpacking the >>tar ball, or you won't get the proper file replacement behavior. >> >>See Repeatmasker installation instructions here --> >>http://www.repeatmasker.org/RMDownload.html >> >>Thanks, >>Carson >> >> >> >>On 9/25/13 7:49 AM, "graham etherington (TSL)" >> wrote: >> >>>Hi, >>>I'm getting the following error when I run maker v2.28: >>>WARNING: RepBase is not installed for RepeatMasker. This limits >>>RepeatMasker's functionality and makes the model_org option in the >>>control files virtually meaningless. MAKER will now reconfigure >>>for simple repeat masking only. >>> >>> >>> >>>In maker_opts.clt I have: >>>model_org=all >>>In maker_exe.ctl I have: >>>RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker >>> >>>Instructions in the GMOD maker tutorial state: >>>"Unpack the contents of the RepBase tarball into the >>>RepeatMasker/Libraries directory." >>> >>> >>>So, I have RepBase located as follows: >>> >>>/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >>>The content of this directory is: >>>RepBase18.08.embl/ >>>RepBase18.08.fasta/ >>> >>>Could someone tell me how/where maker looks for REPBase and which files >>>(embl? fasta? something else?) I need in there? >>> >>>Many thanks for your help, >>>Graham >>> >>> >>>Dr. Graham Etherington >>>Bioinformatics Support Officer, >>>The Sainsbury Laboratory, >>>Norwich Research Park, >>>Norwich NR4 7UH. >>>UK >>>Tel: +44 (0)1603 450601 >>> >>> >>> >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Wed Sep 25 09:35:46 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:35:46 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: If it is launching predictors then you have snap hmm or augustus_species set. You ned to blank out all other options in the control files (including repeat masking options, proteins, ESTs, etc.) when trying to convert mathc/match_part to gene/mRNA/exons/CDS, or else those other programs will run. --Carson On 9/25/13 10:31 AM, "Xia.Cao at dupont.com" wrote: >Hi Carson, > >Thank you for the message and your kind help. We tested maker2 by setting >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >seemed maker2 started to launch all predictors again and it took long >time to finish. I wonder if there is any way that we can directly get >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >match/match_part features into gene/mRNA/exons/CDS. > >Thanks, >Xia > >-----Original Message----- >From: Carson Holt [mailto:carsonhh at gmail.com] >Sent: Thursday, September 19, 2013 5:58 PM >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >maker-devel at yandell-lab.org >Subject: Re: [maker-devel] maker2 scripts for functional annotation > >Hello Corban & Xia, > >Some scripts like gff3_preds2models are deprecated. To get the same >result as was offered by gff3_preds2models, just give your >match/match_part features to pref_gff= in the maker_opts.ctl file, set >keep_preds=1, and run with all other options and predictors turned off. >The final MAKER result will be your match/match_part features converted >into gene/mRNA/exons/CDS. > >For functional annotation, you can use Interproscan, BLASTP against >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >scripts. Then add putative gene functions via BLASTP to UniProt and >maker_functional_fasta and maker_functional_gff scripts. > >Go ahead and take a look and that those tools and let me know if you have >any questions or need help you configuring them. > >Thanks, >Carson > > >On 9/19/13 11:53 AM, "Mark Yandell" wrote: > >>Hi Corban & Xia, >> >> >>I've forwarded your question along to the MAKER_dev list, were you can >>get speedy answers to your maker related questions. Thanks for using >>MAKER. >> >>--mark >> >> >>Mark Yandell >>Professor of Human Genetics >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>Human Genetics University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>ph:801-587-7707 >> >>________________________________________ >>From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >>Sent: Thursday, September 19, 2013 11:49 AM >>To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >>Subject: maker2 scripts for functional annotation >> >>Dr. Yandell, >> >>We were recently evaluating maker2 for annotation and going through the >>maker tutorial from 2012. >> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >> >>The tutorial makes references to some scripts that we couldn?t find in >>the current release. We were looking for scripts like >>gff3_preds2models to convert match/match_part format into annotations >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>not have the most up to date version. >> >>In addition to getting accurate gene annotations, I was looking for a >>solution to get functional assignments. I see that there are some >>scripts like maker_functional_fasta that may help, but I was wondering >>what you would recommend. >> >>Thanks, >> >>Corban & Xia >> >>This communication is for use by the intended recipient and contains >>information that may be Privileged, confidential or copyrighted under >>applicable law. If you are not the intended recipient, you are hereby >>formally notified that any use, copying or distribution of this e-mail, >>in whole or in part, is strictly prohibited. Please notify the sender >>by return e-mail and delete this e-mail from your system. Unless >>explicitly and conspicuously designated as "E-Contract Intended", this >>e-mail does not constitute a contract offer, a contract amendment, or >>an acceptance of a contract offer. This e-mail does not constitute a >>consent to the use of sender's contact information for direct marketing >>purposes or for transfers of data to third parties. >> >>The dupont.com web address will continue in use for a transitional >>period for communications sent or received on behalf of DuPont >>Performance Coatings., which is not affiliated in any way with the >>DuPont Company. >> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>Korean >> >> http://www.DuPont.com/corp/email_disclaimer.html >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender by >return e-mail and delete this e-mail from your system. Unless explicitly >and conspicuously designated as "E-Contract Intended", this e-mail does >not constitute a contract offer, a contract amendment, or an acceptance >of a contract offer. This e-mail does not constitute a consent to the >use of sender's contact information for direct marketing purposes or for >transfers of data to third parties. > >The dupont.com web address will continue in use for a >transitional period for communications sent or received on behalf of >DuPont >Performance Coatings., which is not affiliated in any way with the DuPont >Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean > > http://www.DuPont.com/corp/email_disclaimer.html > From Xia.Cao at dupont.com Wed Sep 25 09:31:25 2013 From: Xia.Cao at dupont.com (Xia.Cao at dupont.com) Date: Wed, 25 Sep 2013 14:31:25 +0000 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: References: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Message-ID: Hi Carson, Thank you for the message and your kind help. We tested maker2 by setting keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it seemed maker2 started to launch all predictors again and it took long time to finish. I wonder if there is any way that we can directly get gene/mRNA/exons/CDS gff file without re-running maker2 to convert match/match_part features into gene/mRNA/exons/CDS. Thanks, Xia -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, September 19, 2013 5:58 PM To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker2 scripts for functional annotation Hello Corban & Xia, Some scripts like gff3_preds2models are deprecated. To get the same result as was offered by gff3_preds2models, just give your match/match_part features to pref_gff= in the maker_opts.ctl file, set keep_preds=1, and run with all other options and predictors turned off. The final MAKER result will be your match/match_part features converted into gene/mRNA/exons/CDS. For functional annotation, you can use Interproscan, BLASTP against UniProt, or BALST2GO. My preference is to use InterProScan to add GO terms and proteins domains via the ipr_update_gff and iprscan2gff3 scripts. Then add putative gene functions via BLASTP to UniProt and maker_functional_fasta and maker_functional_gff scripts. Go ahead and take a look and that those tools and let me know if you have any questions or need help you configuring them. Thanks, Carson On 9/19/13 11:53 AM, "Mark Yandell" wrote: >Hi Corban & Xia, > > >I've forwarded your question along to the MAKER_dev list, were you can >get speedy answers to your maker related questions. Thanks for using >MAKER. > >--mark > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >Human Genetics University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >Sent: Thursday, September 19, 2013 11:49 AM >To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >Subject: maker2 scripts for functional annotation > >Dr. Yandell, > >We were recently evaluating maker2 for annotation and going through the >maker tutorial from 2012. > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >The tutorial makes references to some scripts that we couldn?t find in >the current release. We were looking for scripts like >gff3_preds2models to convert match/match_part format into annotations >with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >not have the most up to date version. > >In addition to getting accurate gene annotations, I was looking for a >solution to get functional assignments. I see that there are some >scripts like maker_functional_fasta that may help, but I was wondering >what you would recommend. > >Thanks, > >Corban & Xia > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender >by return e-mail and delete this e-mail from your system. Unless >explicitly and conspicuously designated as "E-Contract Intended", this >e-mail does not constitute a contract offer, a contract amendment, or >an acceptance of a contract offer. This e-mail does not constitute a >consent to the use of sender's contact information for direct marketing >purposes or for transfers of data to third parties. > >The dupont.com web address will continue in use for a transitional >period for communications sent or received on behalf of DuPont >Performance Coatings., which is not affiliated in any way with the >DuPont Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese >Korean > > http://www.DuPont.com/corp/email_disclaimer.html > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. The dupont.com web address will continue in use for a transitional period for communications sent or received on behalf of DuPont Performance Coatings., which is not affiliated in any way with the DuPont Company. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From Ambrose.Andongabo at rothamsted.ac.uk Thu Sep 26 06:23:13 2013 From: Ambrose.Andongabo at rothamsted.ac.uk (Ambrose Andongabo (RRes-Roth)) Date: Thu, 26 Sep 2013 11:23:13 +0000 Subject: [maker-devel] Using RNA-seq data from tophat/cufflinks in maker Message-ID: Dear Carson, I have been successfully running the MAKER pipeline trying to improve gene annotations. Strangely after trying to visualize my data in GBrowse I noticed that although my density and coverage plots and even raw read plots show clearly that there is a gene feature in a particular region(confirmed by the cufflinks track), this is not called by MAKER and thus not improving my annotation as I expected. I think the problem starts where I converted the cufflinks gtf files to gff3 using the script you provided(cufflinks2gff3). I will be please if you can be of any help trying to explain how I can perform the conversion so that it looks like a proper gff3 file that maker will then use to instruct the gene predictors Many thanks in advance Ambrose -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. From carsonhh at gmail.com Fri Sep 27 05:48:29 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Sep 2013 06:48:29 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: From: Carson Holt Date: Friday, September 27, 2013 6:42 AM To: Subject: Re: [maker-devel] maker2 scripts for functional annotation If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins). If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.). If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run. Thanks, Carson From: Date: Friday, September 27, 2013 4:34 AM To: Carson Holt Subject: Re: [maker-devel] maker2 scripts for functional annotation Hi... Xia and Carson I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure. The CEGMA output is a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not)) If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source. AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses. Hope this helps and apologies if i'm wrong! On Wednesday, 25 September 2013 15:35:46 UTC+1, Carson Holt wrote: > If it is launching predictors then you have snap hmm or augustus_species > set. You ned to blank out all other options in the control files > (including repeat masking options, proteins, ESTs, etc.) when trying to > convert mathc/match_part to gene/mRNA/exons/CDS, or else those other > programs will run. > > --Carson > > > On 9/25/13 10:31 AM, "Xia... at dupont.com " > wrote: > >> >Hi Carson, >> > >> >Thank you for the message and your kind help. We tested maker2 by setting >> >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >> >seemed maker2 started to launch all predictors again and it took long >> >time to finish. I wonder if there is any way that we can directly get >> >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >> >match/match_part features into gene/mRNA/exons/CDS. >> > >> >Thanks, >> >Xia >> > >> >-----Original Message----- >> >From: Carson Holt [mailto:cars... at gmail.com ] >> >Sent: Thursday, September 19, 2013 5:58 PM >> >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >> >maker... at yandell-lab.org >> >Subject: Re: [maker-devel] maker2 scripts for functional annotation >> > >> >Hello Corban & Xia, >> > >> >Some scripts like gff3_preds2models are deprecated. To get the same >> >result as was offered by gff3_preds2models, just give your >> >match/match_part features to pref_gff= in the maker_opts.ctl file, set >> >keep_preds=1, and run with all other options and predictors turned off. >> >The final MAKER result will be your match/match_part features converted >> >into gene/mRNA/exons/CDS. >> > >> >For functional annotation, you can use Interproscan, BLASTP against >> >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >> >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >> >scripts. Then add putative gene functions via BLASTP to UniProt and >> >maker_functional_fasta and maker_functional_gff scripts. >> > >> >Go ahead and take a look and that those tools and let me know if you have >> >any questions or need help you configuring them. >> > >> >Thanks, >> >Carson >> > >> > >> >On 9/19/13 11:53 AM, "Mark Yandell" >> > wrote: >> > >>> >>Hi Corban & Xia, >>> >> >>> >> >>> >>I've forwarded your question along to the MAKER_dev list, were you can >>> >>get speedy answers to your maker related questions. Thanks for using >>> >>MAKER. >>> >> >>> >>--mark >>> >> >>> >> >>> >>Mark Yandell >>> >>Professor of Human Genetics >>> >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>> >>Human Genetics University of Utah >>> >>15 North 2030 East, Room 2100 >>> >>Salt Lake City, UT 84112-5330 >>> >>ph:801-587-7707 >>> >> >>> >>________________________________________ >>> >>From: Xia... at dupont.com [Xia... at dupont.com ] >>> >>Sent: Thursday, September 19, 2013 11:49 AM >>> >>To: Mark Yandell; Corban-Gre... at dupont.com >>> >>Subject: maker2 scripts for functional annotation >>> >> >>> >>Dr. Yandell, >>> >> >>> >>We were recently evaluating maker2 for annotation and going through the >>> >>maker tutorial from 2012. >>> >> >>> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >>> >> >>> >>The tutorial makes references to some scripts that we couldn?t find in >>> >>the current release. We were looking for scripts like >>> >>gff3_preds2models to convert match/match_part format into annotations >>> >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>> >>not have the most up to date version. >>> >> >>> >>In addition to getting accurate gene annotations, I was looking for a >>> >>solution to get functional assignments. I see that there are some >>> >>scripts like maker_functional_fasta that may help, but I was wondering >>> >>what you would recommend. >>> >> >>> >>Thanks, >>> >> >>> >>Corban & Xia >>> >> >>> >>This communication is for use by the intended recipient and contains >>> >>information that may be Privileged, confidential or copyrighted under >>> >>applicable law. If you are not the intended recipient, you are hereby >>> >>formally notified that any use, copying or distribution of this e-mail, >>> >>in whole or in part, is strictly prohibited. Please notify the sender >>> >>by return e-mail and delete this e-mail from your system. Unless >>> >>explicitly and conspicuously designated as "E-Contract Intended", this >>> >>e-mail does not constitute a contract offer, a contract amendment, or >>> >>an acceptance of a contract offer. This e-mail does not constitute a >>> >>consent to the use of sender's contact information for direct marketing >>> >>purposes or for transfers of data to third parties. >>> >> >>> >>The dupont.com web address will continue in use for a >>> transitional >>> >>period for communications sent or received on behalf of DuPont >>> >>Performance Coatings., which is not affiliated in any way with the >>> >>DuPont Company. >>> >> >>> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>> >>Korean >>> >> >>> >> http://www.DuPont.com/corp/email_disclaimer.html >>> >> >>> >>_______________________________________________ >>> >>maker-devel mailing list >>> >>maker... at box290.bluehost.com >>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > >> > >> > >> >This communication is for use by the intended recipient and contains >> >information that may be Privileged, confidential or copyrighted under >> >applicable law. If you are not the intended recipient, you are hereby >> >formally notified that any use, copying or distribution of this e-mail, >> >in whole or in part, is strictly prohibited. Please notify the sender by >> >return e-mail and delete this e-mail from your system. Unless explicitly >> >and conspicuously designated as "E-Contract Intended", this e-mail does >> >not constitute a contract offer, a contract amendment, or an acceptance >> >of a contract offer. This e-mail does not constitute a consent to the >> >use of sender's contact information for direct marketing purposes or for >> >transfers of data to third parties. >> > >> >The dupont.com web address will >> continue in use for a >> >transitional period for communications sent or received on behalf of >> >DuPont >> >Performance Coatings., which is not affiliated in any way with the DuPont >> >Company. >> > >> >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean >> > >> > http://www.DuPont.com/corp/email_disclaimer.html >> > > > _______________________________________________ > maker-devel mailing list > maker... at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 27 05:48:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Sep 2013 06:48:52 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: So to give a little background to this, the question was how to turn match/match_part into gene/mRNA/exon/CDS like the old gff3_preds2models script. The steps below will basically just turn maker into a feature type converter and ignore all it's other capabilities. That being said, depending on what your final goal is, you might actually want to be running something a different way, but if your only goal is to blindly convert feature types, then those steps will work. Thanks, Carson From: Carson Holt Date: Friday, September 27, 2013 6:42 AM To: Subject: Re: [maker-devel] maker2 scripts for functional annotation If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins). If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.). If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run. Thanks, Carson From: Date: Friday, September 27, 2013 4:34 AM To: Carson Holt Subject: Re: [maker-devel] maker2 scripts for functional annotation Hi... Xia and Carson I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure. The CEGMA output is a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not)) If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source. AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses. Hope this helps and apologies if i'm wrong! On Wednesday, 25 September 2013 15:35:46 UTC+1, Carson Holt wrote: > If it is launching predictors then you have snap hmm or augustus_species > set. You ned to blank out all other options in the control files > (including repeat masking options, proteins, ESTs, etc.) when trying to > convert mathc/match_part to gene/mRNA/exons/CDS, or else those other > programs will run. > > --Carson > > > On 9/25/13 10:31 AM, "Xia... at dupont.com " > wrote: > >> >Hi Carson, >> > >> >Thank you for the message and your kind help. We tested maker2 by setting >> >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >> >seemed maker2 started to launch all predictors again and it took long >> >time to finish. I wonder if there is any way that we can directly get >> >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >> >match/match_part features into gene/mRNA/exons/CDS. >> > >> >Thanks, >> >Xia >> > >> >-----Original Message----- >> >From: Carson Holt [mailto:cars... at gmail.com ] >> >Sent: Thursday, September 19, 2013 5:58 PM >> >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >> >maker... at yandell-lab.org >> >Subject: Re: [maker-devel] maker2 scripts for functional annotation >> > >> >Hello Corban & Xia, >> > >> >Some scripts like gff3_preds2models are deprecated. To get the same >> >result as was offered by gff3_preds2models, just give your >> >match/match_part features to pref_gff= in the maker_opts.ctl file, set >> >keep_preds=1, and run with all other options and predictors turned off. >> >The final MAKER result will be your match/match_part features converted >> >into gene/mRNA/exons/CDS. >> > >> >For functional annotation, you can use Interproscan, BLASTP against >> >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >> >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >> >scripts. Then add putative gene functions via BLASTP to UniProt and >> >maker_functional_fasta and maker_functional_gff scripts. >> > >> >Go ahead and take a look and that those tools and let me know if you have >> >any questions or need help you configuring them. >> > >> >Thanks, >> >Carson >> > >> > >> >On 9/19/13 11:53 AM, "Mark Yandell" >> > wrote: >> > >>> >>Hi Corban & Xia, >>> >> >>> >> >>> >>I've forwarded your question along to the MAKER_dev list, were you can >>> >>get speedy answers to your maker related questions. Thanks for using >>> >>MAKER. >>> >> >>> >>--mark >>> >> >>> >> >>> >>Mark Yandell >>> >>Professor of Human Genetics >>> >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>> >>Human Genetics University of Utah >>> >>15 North 2030 East, Room 2100 >>> >>Salt Lake City, UT 84112-5330 >>> >>ph:801-587-7707 >>> >> >>> >>________________________________________ >>> >>From: Xia... at dupont.com [Xia... at dupont.com ] >>> >>Sent: Thursday, September 19, 2013 11:49 AM >>> >>To: Mark Yandell; Corban-Gre... at dupont.com >>> >>Subject: maker2 scripts for functional annotation >>> >> >>> >>Dr. Yandell, >>> >> >>> >>We were recently evaluating maker2 for annotation and going through the >>> >>maker tutorial from 2012. >>> >> >>> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >>> >> >>> >>The tutorial makes references to some scripts that we couldn?t find in >>> >>the current release. We were looking for scripts like >>> >>gff3_preds2models to convert match/match_part format into annotations >>> >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>> >>not have the most up to date version. >>> >> >>> >>In addition to getting accurate gene annotations, I was looking for a >>> >>solution to get functional assignments. I see that there are some >>> >>scripts like maker_functional_fasta that may help, but I was wondering >>> >>what you would recommend. >>> >> >>> >>Thanks, >>> >> >>> >>Corban & Xia >>> >> >>> >>This communication is for use by the intended recipient and contains >>> >>information that may be Privileged, confidential or copyrighted under >>> >>applicable law. If you are not the intended recipient, you are hereby >>> >>formally notified that any use, copying or distribution of this e-mail, >>> >>in whole or in part, is strictly prohibited. Please notify the sender >>> >>by return e-mail and delete this e-mail from your system. Unless >>> >>explicitly and conspicuously designated as "E-Contract Intended", this >>> >>e-mail does not constitute a contract offer, a contract amendment, or >>> >>an acceptance of a contract offer. This e-mail does not constitute a >>> >>consent to the use of sender's contact information for direct marketing >>> >>purposes or for transfers of data to third parties. >>> >> >>> >>The dupont.com web address will continue in use for a >>> transitional >>> >>period for communications sent or received on behalf of DuPont >>> >>Performance Coatings., which is not affiliated in any way with the >>> >>DuPont Company. >>> >> >>> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>> >>Korean >>> >> >>> >> http://www.DuPont.com/corp/email_disclaimer.html >>> >> >>> >>_______________________________________________ >>> >>maker-devel mailing list >>> >>maker... at box290.bluehost.com >>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > >> > >> > >> >This communication is for use by the intended recipient and contains >> >information that may be Privileged, confidential or copyrighted under >> >applicable law. If you are not the intended recipient, you are hereby >> >formally notified that any use, copying or distribution of this e-mail, >> >in whole or in part, is strictly prohibited. Please notify the sender by >> >return e-mail and delete this e-mail from your system. Unless explicitly >> >and conspicuously designated as "E-Contract Intended", this e-mail does >> >not constitute a contract offer, a contract amendment, or an acceptance >> >of a contract offer. This e-mail does not constitute a consent to the >> >use of sender's contact information for direct marketing purposes or for >> >transfers of data to third parties. >> > >> >The dupont.com web address will >> continue in use for a >> >transitional period for communications sent or received on behalf of >> >DuPont >> >Performance Coatings., which is not affiliated in any way with the DuPont >> >Company. >> > >> >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean >> > >> > http://www.DuPont.com/corp/email_disclaimer.html >> > > > _______________________________________________ > maker-devel mailing list > maker... at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Sun Sep 1 02:17:07 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Sun, 1 Sep 2013 08:17:07 +0000 Subject: [maker-devel] error about DBD::SQLite::db Message-ID: Dear all, When I try to run maker on my test dataset, there is an error like this: DBD::SQLite::db do failed: near ",": syntax error at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 1. DBD::SQLite::db do failed: no such column: JUNC00000001 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 2. DBD::SQLite::db do failed: no such column: JUNC00000002 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 3. DBD::SQLite::db do failed: no such column: JUNC00000003 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 4. DBD::SQLite::db do failed: no such column: JUNC00000004 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 5. DBD::SQLite::db do failed: no such column: JUNC00000005 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 6. DBD::SQLite::db do failed: no such column: JUNC00000006 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 7. DBD::SQLite::db do failed: no such column: JUNC00000007 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 8. DBD::SQLite::db do failed: no such column: JUNC00000008 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 9. DBD::SQLite::db do failed: no such column: JUNC00000009 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 10. DBD::SQLite::db do failed: no such column: JUNC00000010 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 11. DBD::SQLite::db do failed: no such column: JUNC00000011 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 12. DBD::SQLite::db do failed: no such column: JUNC00000012 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 13. DBD::SQLite::db do failed: no such column: JUNC00000013 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 14. DBD::SQLite::db do failed: no such column: JUNC00000014 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 15. DBD::SQLite::db do failed: no such column: JUNC00000015 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 16. DBD::SQLite::db do failed: no such column: JUNC00000016 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 17. DBD::SQLite::db do failed: no such column: JUNC00000017 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 18. The JUN*** is the exteral EST I provide. Can anyone give me some suggestions? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Sep 1 05:26:47 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 01 Sep 2013 07:26:47 -0400 Subject: [maker-devel] error about DBD::SQLite::db In-Reply-To: Message-ID: Most likely an issue with your input files format. Try this GFF3 file validator --> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Also make sure you are using the most recent version of MAKER. --Carson From: Jingjing Jin Date: Sunday, September 1, 2013 4:17 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] error about DBD::SQLite::db Dear all, When I try to run maker on my test dataset, there is an error like this: DBD::SQLite::db do failed: near ",": syntax error at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 1. DBD::SQLite::db do failed: no such column: JUNC00000001 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 2. DBD::SQLite::db do failed: no such column: JUNC00000002 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 3. DBD::SQLite::db do failed: no such column: JUNC00000003 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 4. DBD::SQLite::db do failed: no such column: JUNC00000004 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 5. DBD::SQLite::db do failed: no such column: JUNC00000005 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 6. DBD::SQLite::db do failed: no such column: JUNC00000006 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 7. DBD::SQLite::db do failed: no such column: JUNC00000007 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 8. DBD::SQLite::db do failed: no such column: JUNC00000008 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 9. DBD::SQLite::db do failed: no such column: JUNC00000009 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 10. DBD::SQLite::db do failed: no such column: JUNC00000010 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 11. DBD::SQLite::db do failed: no such column: JUNC00000011 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 12. DBD::SQLite::db do failed: no such column: JUNC00000012 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 13. DBD::SQLite::db do failed: no such column: JUNC00000013 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 14. DBD::SQLite::db do failed: no such column: JUNC00000014 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 15. DBD::SQLite::db do failed: no such column: JUNC00000015 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 16. DBD::SQLite::db do failed: no such column: JUNC00000016 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 17. DBD::SQLite::db do failed: no such column: JUNC00000017 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 18. The JUN*** is the exteral EST I provide. Can anyone give me some suggestions? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uqslizbe at uq.edu.au Thu Sep 5 01:30:26 2013 From: uqslizbe at uq.edu.au (Selene Lizbeth Fernandez Valverde) Date: Thu, 5 Sep 2013 17:30:26 +1000 Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks Message-ID: Hi all, I'm currently using Maker to reannotate the genome of the marine sponge. We already have a set of Augustus prediction and gene models that I mapped back to the genome using the patched map2assembly script posted on the mailing list, as well as PASA transcripts (based on Trinity assemblies) and cufflinks transcripts. I would like to include both Trinity and Cufflinks, as in some cases one outperforms the other. I'm currently planning to provide the Trinity/PASA assemblies as fasta to the "est" option and the cufflinks assemblies as gff3 using the "est_gff" option but I'm wondering if MAKER will take into account both types of evidence? Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? Thanks in advance for the advice, Selene **est_gff/est --> These are assumed to be correctly assembled and aligned around splice sites (MAKER uses exonerate to align around splice sites for ESTs in FASTA files). MAKER can use them to infer gene models directly (est2genome option), can use them as support for maintaining predictions, and can use them to modify structure and add UTR to predictions. If you let MAKER try and find alternative splice forms, they will be used to identify support for splice variants. How these cluster with other evidence will help MAKER infer gene boundaries in some cases. MAKER will also use splice sites inferred from the ESTs to inform gene predictors during the prediction step. Selene Fernandez-Valverde Ph.D. Postdoctoral Research Fellow School of Biological Sciences University of Queensland St Lucia QLD 4072 Australia uqslizbe at uq.edu.au -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 5 05:04:43 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Sep 2013 07:04:43 -0400 Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks In-Reply-To: Message-ID: 1. I'm wondering if MAKER will take into account both types of evidence? Yes. 2. Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? You can provide them as a comma separated list of files to the est_gff= option, or you can merge them using the gff3_merge script that comes with MAKER. Unfortunately I have no one best option for which evidence types to include. Every evidence type can contribute in it's own way to the final results. When you test using different evidence types, try running on a single large contig and manually view the results in a browser. Thanks, Carson From: Selene Lizbeth Fernandez Valverde Date: Thursday, September 5, 2013 3:30 AM To: Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks Hi all, I'm currently using Maker to reannotate the genome of the marine sponge. We already have a set of Augustus prediction and gene models that I mapped back to the genome using the patched map2assembly script posted on the mailing list, as well as PASA transcripts (based on Trinity assemblies) and cufflinks transcripts. I would like to include both Trinity and Cufflinks, as in some cases one outperforms the other. I'm currently planning to provide the Trinity/PASA assemblies as fasta to the "est" option and the cufflinks assemblies as gff3 using the "est_gff" option but I'm wondering if MAKER will take into account both types of evidence? Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? Thanks in advance for the advice, Selene **est_gff/est --> These are assumed to be correctly assembled and aligned around splice sites (MAKER uses exonerate to align around splice sites for ESTs in FASTA files). MAKER can use them to infer gene models directly (est2genome option), can use them as support for maintaining predictions, and can use them to modify structure and add UTR to predictions. If you let MAKER try and find alternative splice forms, they will be used to identify support for splice variants. How these cluster with other evidence will help MAKER infer gene boundaries in some cases. MAKER will also use splice sites inferred from the ESTs to inform gene predictors during the prediction step. Selene Fernandez-Valverde Ph.D. Postdoctoral Research Fellow School of Biological Sciences University of Queensland St Lucia QLD 4072 Australia uqslizbe at uq.edu.au _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.zohren at qmul.ac.uk Thu Sep 5 09:58:39 2013 From: j.zohren at qmul.ac.uk (Jasmin Zohren) Date: Thu, 5 Sep 2013 16:58:39 +0100 Subject: [maker-devel] Maker in the cloud Message-ID: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Dear Maker developers, I've already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: 1. To me it seems that there are two different Maker images on EC2 - ami-ea661f83 and ami-b10abed8 - which one is "the right one"? 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn't enough computing power? Or do you have another explanation for this? 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I've also recently heard about a parallelisation module for use in the cloud developed by Era7, called "nispero". But I am not sure whether it is publicly available yet. 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 5 10:26:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Sep 2013 12:26:08 -0400 Subject: [maker-devel] Maker in the cloud In-Reply-To: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Message-ID: Hello Jasmin, I haven't used MAKER in parralel on the cloud before (just tutorial images); however, I believe there is an iPlant atmosphere image available through iPlant with MAKER version 2.27. You can get maximum 16 cpus per instance there. --> http://www.iplantcollaborative.org/discover/atmosphere Alternatively if you have any US based collaborators you can apply for a startup allocation on the Lonestar cluster via XSEDE (allocation can be requested by any US based researcher and only takes a few days to approve) --> https://www.xsede.org/ That cluster was used recently to process the largest genome ever annotated (the pine genome). Total run time will be less than a day on that cluster, because you can request thousands of CPUs for your job with very short queue wait times. There is also a work in progress to give access to MAKER on the same cluster via the iPlant discovery environment. I've CC'd Joshua Stein who can correct me if I'm wrong, but I believe that resource would be available to non-US based researchers as well, and will be available in the very very near future (potentially within the next month or less). Perhaps someone else on the mailing list may want to share their experience using MAKER on the cloud? Thanks, Carson From: Jasmin Zohren Date: Thursday, September 5, 2013 11:58 AM To: Subject: [maker-devel] Maker in the cloud Dear Maker developers, I?ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: 1. To me it seems that there are two different Maker images on EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn?t enough computing power? Or do you have another explanation for this? 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I?ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called ?nispero?. But I am not sure whether it is publicly available yet. 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Thu Sep 5 12:06:05 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 5 Sep 2013 12:06:05 -0600 Subject: [maker-devel] Maker in the cloud In-Reply-To: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> References: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Message-ID: Hi Jasmin, Like Carson, my only significant experience with MAKER in the cloud is using it for our training, however, I'll add make some comments based on experience on the cloud with some of our other tools: There are several cloud architectures available now, but I only have experience with Amazon EC2, so all comments are only relevant there. I wouldn't use any of the existing MAKER AMIs. All of them were created for tutorial purposes, and while they should work fine for a real annotation job, they will be out of date. At the very least if you use one, start with it, but install current MAKER code and save it as a new AMI. You can use MPI on the Amazon nodes, but it's not set up by default to run MPI between nodes. That, can presumably be done but we haven't done it, so there may be headaches involved we just don't know for sure. However, you could split your input fasta into several chunks of roughly equal size and fire up a different EC2 node for each fasta file, then allow maker to use MPI to optimize parallelization on each node individually. MAKER is really good at restarting if things fail, so with that in mind I'd suggest staring spot nodes which can be 10X cheaper than regularly priced nodes. Amazon will kill a spot node as soon as someone comes along who is willing to pay full price, so you'd want a way (either manually checking and restarting nodes or scripting a AWS API solution) to check whether nodes finished and restart them if they did not, but you could save a lot of money by doing this. B On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote: > Dear Maker developers, > > I?ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: > > 1. To me it seems that there are two different Maker images on EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? > 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? > 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn?t enough computing power? Or do you have another explanation for this? > 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I?ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called ?nispero?. But I am not sure whether it is publicly available yet. > 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. > > Many thanks in advance and kind regards, > Jasmin > > ----------------------------- > Jasmin Zohren > PhD student in the INTERCROSSING ITN > Queen Mary University of London > > intercrossing.wikispaces.com > evolve.sbcs.qmul.ac.uk > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Fri Sep 6 12:34:32 2013 From: ejr at stowers.org (Ross, Eric) Date: Fri, 6 Sep 2013 18:34:32 +0000 Subject: [maker-devel] maker-devel Digest, Vol 64, Issue 4 In-Reply-To: Message-ID: It wouldn't be too difficult to run MAKER to run using something like starcluster. Starcluster manages the cluster and nodes for you. http://star.mit.edu/cluster/ It's not too difficult to use. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org On 9/6/13 1:00 PM, "maker-devel-request at yandell-lab.org" wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. Re: Maker in the cloud (Barry Moore) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Thu, 5 Sep 2013 12:06:05 -0600 >From: Barry Moore >To: Jasmin Zohren >Cc: maker-devel at yandell-lab.org >Subject: Re: [maker-devel] Maker in the cloud >Message-ID: >Content-Type: text/plain; charset="windows-1252" > >Hi Jasmin, > >Like Carson, my only significant experience with MAKER in the cloud is >using it for our training, however, I'll add make some comments based on >experience on the cloud with some of our other tools: > >There are several cloud architectures available now, but I only have >experience with Amazon EC2, so all comments are only relevant there. > >I wouldn't use any of the existing MAKER AMIs. All of them were created >for tutorial purposes, and while they should work fine for a real >annotation job, they will be out of date. At the very least if you use >one, start with it, but install current MAKER code and save it as a new >AMI. You can use MPI on the Amazon nodes, but it's not set up by default >to run MPI between nodes. That, can presumably be done but we haven't >done it, so there may be headaches involved we just don't know for sure. >However, you could split your input fasta into several chunks of roughly >equal size and fire up a different EC2 node for each fasta file, then >allow maker to use MPI to optimize parallelization on each node >individually. MAKER is really good at restarting if things fail, so with >that in mind I'd suggest staring spot nodes which can be 10X cheaper than >regularly priced nodes. Amazon will kill a spot node as soon as someone >comes along who is willing to pay full price, so you'd want a way (either >manually checking and restarting nodes or scripting a AWS API solution) >to check whether nodes finished and restart them if they did not, but you >could save a lot of money by doing this. > >B > >On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote: > >> Dear Maker developers, >> >> I?ve already contacted you a while ago about my annotation of the birch >>genome (Betula nana). As I am constantly running into problems using our >>cluster facilities at QMUL I thought of moving into the cloud. As I am >>rather inexperienced in cloud computing I have several questions: >> >> 1. To me it seems that there are two different Maker images on >>EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? >> 2. Can I use this Maker AMI for the annotation of a whole genome >>or is it only suitable for the tutorial tasks? >> 3. Also, when I followed the steps outlined in the tutorial, >>there seemed to be a problem with RepeatMasker. Although Maker would run >>and produce output files, the log file stated that the contig had failed >>after the second attempt. I launched the image on a T1.micro instance, >>maybe that wasn?t enough computing power? Or do you have another >>explanation for this? >> 4. Would it be possible to run the annotation in parallel (e.g. >>using MPICH2) in the cloud? I?ve also recently heard about a >>parallelisation module for use in the cloud developed by Era7, called >>?nispero?. But I am not sure whether it is publicly available yet. >> 5. Do you have any experience of how long an annotation task in >>the cloud would take and also what the expected costs would be? The >>birch genome is only 500 MB in size and currently I am simply annotating >>it with a SNAP trained HMM. However, in the future I will feed it with >>RNAseq data as well. >> >> Many thanks in advance and kind regards, >> Jasmin >> >> >> ----------------------------- >> Jasmin Zohren >> PhD student in the INTERCROSSING ITN >> Queen Mary University of London >> >> intercrossing.wikispaces.com >> evolve.sbcs.qmul.ac.uk >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >Barry Moore >Research Scientist >Dept. of Human Genetics >University of Utah >Salt Lake City, UT 84112 >-------------------------------------------- >(801) 585-3543 > > > > >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: >nts/20130905/bf35206e/attachment-0001.html> > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 64, Issue 4 >****************************************** From bhall7 at hawaii.edu Wed Sep 11 14:23:28 2013 From: bhall7 at hawaii.edu (Brian Hall) Date: Wed, 11 Sep 2013 10:23:28 -1000 Subject: [maker-devel] Question about phase for CDS with start codon Message-ID: <5230D140.7010804@hawaii.edu> Aloha, I'm working with a gff produced by maker. (I didn't run the program myself, but I believe it was version 2.24.) Here are the lines in question: scaffold00033 maker CDS 729494 729949 . - 2 ID=107343;Name=BDOR_005037-RC:cds:250;Parent=107334 scaffold00033 maker start_codon 729947 729949 . - . ID=107349;Name=BDOR_005037-RB:start1;Parent=107334 If I understand correctly, the start codon in this reverse-strand CDS is from position 729949 to 729947 -- the first three bases in the CDS. However, the phase value for the CDS is 2, which essentially skips the start codon. Downstream software (tbl2asn) is kicking up a "missing start codon" error. I have several hundred such issues in the gff for a single genome. They generally only occur on reverse-strand CDSs. Any ideas? Sincerest apologies if this is a duplicate question or if I've provided incomplete information. I am new at this. Thanks for your help! --Brian From ckuanglim at gmail.com Wed Sep 11 23:42:38 2013 From: ckuanglim at gmail.com (Chan Kuang Lim) Date: Thu, 12 Sep 2013 13:42:38 +0800 Subject: [maker-devel] Exon Type in MAKER GFF Output Message-ID: Dear Maker developers, I have a question regarding the GFF output of MAKER. When we look at CDS and Exon, we do not know whether they are initial, internal, terminal or single. How can we capture the exon type from MAKER output? Thanks, Chan KL -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 12 08:21:48 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Sep 2013 10:21:48 -0400 Subject: [maker-devel] Exon Type in MAKER GFF Output In-Reply-To: Message-ID: That information is not explicit in GFF3 format. You have to capture all exons parented onto the mRNA, then sort them to identify if the exon is 5-prime, 3-prime, internal, or single exon. --Carson From: Chan Kuang Lim Date: Thursday, September 12, 2013 1:42 AM To: Subject: [maker-devel] Exon Type in MAKER GFF Output Dear Maker developers, I have a question regarding the GFF output of MAKER. When we look at CDS and Exon, we do not know whether they are initial, internal, terminal or single. How can we capture the exon type from MAKER output? Thanks, Chan KL _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 12 09:27:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Sep 2013 11:27:44 -0400 Subject: [maker-devel] Question about phase for CDS with start codon In-Reply-To: <5230D140.7010804@hawaii.edu> Message-ID: I know there was an incorrect phase issue on a previous maker version that is now fixed, but I really doubt that is the issue causing your error. What are you using to convert from GFF3 to tbl format before using tbl2asn? I'd start there. we can send you a GFF3 to tbl converter if that will help. --Carson On 9/11/13 4:23 PM, "Brian Hall" wrote: >Aloha, > >I'm working with a gff produced by maker. (I didn't run the program >myself, but I believe it was version 2.24.) Here are the lines in >question: > >scaffold00033 maker CDS 729494 729949 . - 2 >ID=107343;Name=BDOR_005037-RC:cds:250;Parent=107334 >scaffold00033 maker start_codon 729947 729949 . - . >ID=107349;Name=BDOR_005037-RB:start1;Parent=107334 > >If I understand correctly, the start codon in this reverse-strand CDS is >from position 729949 to 729947 -- the first three bases in the CDS. >However, the phase value for the CDS is 2, which essentially skips the >start codon. Downstream software (tbl2asn) is kicking up a "missing >start codon" error. > >I have several hundred such issues in the gff for a single genome. They >generally only occur on reverse-strand CDSs. Any ideas? > >Sincerest apologies if this is a duplicate question or if I've provided >incomplete information. I am new at this. Thanks for your help! > >--Brian > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Fri Sep 13 02:15:29 2013 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Fri, 13 Sep 2013 10:15:29 +0200 Subject: [maker-devel] Maker pass-through behavior Message-ID: <5232C9A1.3060709@imbim.uu.se> Dear list, I have started using Maker to explore its use for a number of genome projects we are planning on running. One of the tools we intend on incorporating into our pipeline is PASA (Since we will be using Trinity etc). The (cleaned) output with predicted gene structures I would like to pass to Maker as pass-through annotation (I am optimistic that way...) - but I noticed that doing so does not always result in the incorporation of the PASA gene model into the final maker annotation track. Sometimes it seems to be superseded by an Augustus/Maker model, sometimes the region stays empty (even tho a protein alignment is present). So my question is how Maker handles pass-throughs, exactly. Can it reject pass-throughs, or should it always use such models over any other data source? Is there any scenario were it wouldn't? I understand that Maker uses some internal scoring system to estimate the accuracy of an annotation - could that be a reason? It would be a bit odd tho, since a lift-over from chicken (to our bird genome) seems to support gene models produced by PASA, yet they are nowhere to be found in the final models. And a related question: Is there a comprehensive documentation where I can get more information on the internal decision making process of Maker? Or do I have to dig into the code for that? Cheers, Marc PS I have attached a screenshot of such an example - the green track is Maker with proteins + augustus (chicken models) + PASA pass-through of a cleaned-up gene structure file. (Orange: Cleaned ORFs directly from PASA output, Grey: PASA ORFs without cleaning, Dark red: Maker with proteins and trinity transcripts as EST evidence, Black: chicken lift-overs from EnsEMBL) -------------- next part -------------- A non-text attachment was scrubbed... Name: igv_snapshot.png Type: image/png Size: 50142 bytes Desc: not available URL: From carsonhh at gmail.com Sun Sep 15 12:39:29 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 15 Sep 2013 14:39:29 -0400 Subject: [maker-devel] Maker pass-through behavior In-Reply-To: <5232C9A1.3060709@imbim.uu.se> Message-ID: > So my question is how Maker handles pass-throughs, exactly. Can it > reject pass-throughs, or should it always use such models over any other > data source? Is there any scenario were it wouldn't? pred_gff is treated the same as any other ab initio prediction. It is just one among several candidate gene models. The model that is kept is the one with the lowest AED score (lower means better evidence match/support). Any model with no evidence support or AED=1 will be rejected (no evidence support) unless keep_preds=1 is set. There is also another score eAED which takes into account protein reading frame (protein evidence must be in same reading frame as the gene model). If eAED =1 it will also cause models to be rejected. > I understand that Maker uses some internal scoring system to estimate > the accuracy of an annotation - could that be a reason? Possibly. Look at the AED score of the pass-through model in the final MAKER GFF3 to see what the AED score was. If you want to send me GFF3 to look at with a list of regions you are concerned about I can tell you more. Also consider giving PASA results to est_gff as well to bias the scoring algorithm to maintain those models (I.e. Model supports itself, which is reasonable since these are EST derived anyways and not just ab initio predictions). Also the model_gff option will always keep an input model (with or without evidence support) and will only replace it with something else if that something else has a better AED score. > > > > And a related question: Is there a comprehensive documentation where I > can get more information on the internal decision making process of > Maker? Or do I have to dig into the code for that? Look at these two papers --> Holt, C., and Yandell, M. (2011). MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491. Eilbeck, K., Moore, B., Holt, C., and Yandell, M. (2009). Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67. Thanks, Carson On 9/13/13 4:15 AM, "Marc P. Hoeppner" wrote: > Dear list, > > I have started using Maker to explore its use for a number of genome > projects we are planning on running. One of the tools we intend on > incorporating into our pipeline is PASA (Since we will be using Trinity > etc). The (cleaned) output with predicted gene structures I would like > to pass to Maker as pass-through annotation (I am optimistic that > way...) - but I noticed that doing so does not always result in the > incorporation of the PASA gene model into the final maker annotation > track. Sometimes it seems to be superseded by an Augustus/Maker model, > sometimes the region stays empty (even tho a protein alignment is present). > > So my question is how Maker handles pass-throughs, exactly. Can it > reject pass-throughs, or should it always use such models over any other > data source? Is there any scenario were it wouldn't? > > I understand that Maker uses some internal scoring system to estimate > the accuracy of an annotation - could that be a reason? It would be a > bit odd tho, since a lift-over from chicken (to our bird genome) seems > to support gene models produced by PASA, yet they are nowhere to be > found in the final models. > > And a related question: Is there a comprehensive documentation where I > can get more information on the internal decision making process of > Maker? Or do I have to dig into the code for that? > > Cheers, > > Marc > > PS I have attached a screenshot of such an example - the green track is > Maker with proteins + augustus (chicken models) + PASA pass-through of a > cleaned-up gene structure file. (Orange: Cleaned ORFs directly from PASA > output, Grey: PASA ORFs without cleaning, Dark red: Maker with proteins > and trinity transcripts as EST evidence, Black: chicken lift-overs from > EnsEMBL) > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhinsley at ebi.ac.uk Mon Sep 16 03:51:35 2013 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Mon, 16 Sep 2013 10:51:35 +0100 Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes In-Reply-To: <5232C9A1.3060709@imbim.uu.se> References: <5232C9A1.3060709@imbim.uu.se> Message-ID: <5236D4A7.6080303@ebi.ac.uk> Hello I'm trying to get maker to run on MPI using several nodes. I have an installation set up by a colleague which includes maker 2.27 and openmpi-1.4.3. Previously it has only been used (here at EBI) with maker processes running on one node only, but i find that it can wait a very long time before being scheduled by LSF. The command used to submit is like this (as per recommendations from systems) (uses 8 cpus on each of 8 nodes) |export OMP_NUM_THREADS=||64| |bsub -q mpi -M ||40000| |-R ||"rusage[mem=40000] && span[ptile=8]"| |-n ||64| |-o lsf_log -a openmpi mpirun.lsf -np ||64| |-mca btl tcp,self maker ||2||>&||1| and requires environment be set up in ~/.bashrc for openMPI. This runs but produces a lot of errors like: DBD::SQLite::db do failed: database is locked at /nfs/production/panda/ensemblgenomes/external/maker/2.27_mpi/maker/bin/../lib/GFFDB.pm line 407. I've looked at https://groups.google.com/forum/#!searchin/maker-devel/database$20locked/maker-devel/TscBgbQfBX4/pae016DqlIMJ which suggests that "It means that your GFF3 results will not be integrated" (but i'm not sure what's meant by that, but the number of genes i'm getting is around 2k, expect more like 15k) and that the problem is SQLite using NFS (a known issue), and the fix is to use /tmp. I have TMP= set as per default in maker_opts.ctl, and there are maker directories in /tmp on the runtime nodes, but the database (i guess) is in /nfs/...../maker//.scf.db. I don't see how i could set the working directory to a non-NFS file systems and still use more than one node, but this error only seems to appear (so far) with est2genome, not when running SNAP/ Augustus. Is there a work around to stop getting the locked error or some way to recover from it after maker has finished? Or is it necessary to run the est2genome step (or maker generally) on one node? An obvious option is to split the assembly but i was hoping to avoid that. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 17 21:35:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Sep 2013 21:35:52 -0600 Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes In-Reply-To: <5236D4A7.6080303@ebi.ac.uk> Message-ID: Sorry for the slow reply, I'm currently traveling. Try deleting any *.db diles in the maker output directory to force the SQLite database to be rebuilt. Also you can try the current version of MAKER at yandell-lab.org. MAKER is supposed to try and copy the database to the /tmp directory before it starts work. That way the actual working copy will be local, and will be independent for each node. --Carson From: Malcolm Hinsley Date: Monday, September 16, 2013 3:51 AM To: Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes Hello I'm trying to get maker to run on MPI using several nodes. I have an installation set up by a colleague which includes maker 2.27 and openmpi-1.4.3. Previously it has only been used (here at EBI) with maker processes running on one node only, but i find that it can wait a very long time before being scheduled by LSF. The command used to submit is like this (as per recommendations from systems) (uses 8 cpus on each of 8 nodes) export OMP_NUM_THREADS=64 bsub -q mpi -M 40000 -R "rusage[mem=40000] && span[ptile=8]" -n 64 -o lsf_log -a openmpi mpirun.lsf -np 64 -mca btl tcp,self maker 2>&1 and requires environment be set up in ~/.bashrc for openMPI. This runs but produces a lot of errors like: DBD::SQLite::db do failed: database is locked at /nfs/production/panda/ensemblgenomes/external/maker/2.27_mpi/maker/bin/../li b/GFFDB.pm line 407. I've looked at https://groups.google.com/forum/#!searchin/maker-devel/database$20locked/mak er-devel/TscBgbQfBX4/pae016DqlIMJ which suggests that "It means that your GFF3 results will not be integrated" (but i'm not sure what's meant by that, but the number of genes i'm getting is around 2k, expect more like 15k) and that the problem is SQLite using NFS (a known issue), and the fix is to use /tmp. I have TMP= set as per default in maker_opts.ctl, and there are maker directories in /tmp on the runtime nodes, but the database (i guess) is in /nfs/...../maker//.scf.db. I don't see how i could set the working directory to a non-NFS file systems and still use more than one node, but this error only seems to appear (so far) with est2genome, not when running SNAP/ Augustus. Is there a work around to stop getting the locked error or some way to recover from it after maker has finished? Or is it necessary to run the est2genome step (or maker generally) on one node? An obvious option is to split the assembly but i was hoping to avoid that. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 17 21:57:12 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Sep 2013 21:57:12 -0600 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: It does sound like this is likely the result of gene fusion from the trinity assemblies. One thing to look at is the number of coding exons compared to the other ant species. See if the increase in exons is mostly in UTR, coding sequence, or both. One thing you could try is running MAKER without the EST evidence, just to see how many genes you get with protein only support. There are ways to use multiple MAKER runs to tease out details of the data. For example: run1: protein evidence only plus ab initio predators like snap and augustus. run2: protein and EST evidence. Models from run1 passed in as pred_gff with snap and augustus turned off (this will force the addition of UTR, but not the generation of new models). Use the correct_est_fusion=1 option here to clip UTR that runs into neighboring genes. run3: protein and EST evidence plus augusuts and snap. Then take models fromrun2 and models from run3 that do not overlap run2 and add them all to your final set along with any models that come from interproscan domain analysis of rejected models. This solution is rather lengthy, but may avoid many of the problems you seem to be getting with gene merging even with jaccard_clip and correct_est_fusion turned on, because your ESTs would only contribute to the UTR and to models not found based solely on protein evidence (I.e. They would be ignored in cases where you get enough evidence from other sources). --Carson From: Benjamin Rubin Date: Tuesday, September 17, 2013 10:08 AM To: Carson Holt Subject: Re: [maker-devel] Unexpected results with correct_est_fusion Hi Carson, The new version is working great. Thanks for your help. I do have another more general question. I am working on annotating a new ant genome (Pseudomyrmex gracilis) and the results that I am getting from MAKER are a bit unexpected. The number of genes produced by MAKER is ~14,300 while, as you may know, the seven published ant genomes have at least 16,000 genes (this number was improved by several hundred by turning on correct_est_fusion). Running the ab initio predictions through InterProScan yields ~900 additional genes for P. gracilis so there are still substantially fewer genes found for this species. This difference on its own is not that unexpected; Pseudomyrmex likely diverged from the other sequenced ants by over 100 million years and the genome sequence itself is rather fragmented and incomplete. However, what is bothering me is that, despite having fewer genes, I am seeing substantially larger numbers of exons (~92,000 as opposed to 78-85,000) and the total length of all proteins is more than a million amino acids longer in P. gracilis. It does not have unexpectedly long genes, the average gene length is just a bit higher. I have looked at the annotations of some conserved genes and found some apparently spurious exons merged with these genes. I say that they are spurious because they go beyond the end of the gene sequence in other species (ants and Drosophila). Unfortunately, it appears that many of these spurious calls are primarily the result of blast hits to my EST data. The ESTs generally seem to blast to the genome a bit more often than expected. Partly as a result of the relatively high repeat content of my genome (~50% complex repeats) and partly because we only used two Illumina libraries, my genome sequence is quite fragmented (~280Mb in ~6,500 scaffolds). Note that the total genome length is estimated at 387Mb, so I am missing a fair amount but almost all CEGMA genes are present in the assembly so I have concluded that the missing sequence is predominantly repeats. I have no prior reason to expect that my EST library has anything wrong with it. I did a single Illumina lane of RNA-seq and assembled in Trinity with the jaccard_clip option on to reduce gene fusions. If you have any advice on how my gene predictions can be improved, I would really appreciate it. Have you heard of this kind of problem before? Is there a way to limit the influence of ESTs without discarding them entirely? Thanks so much for your help with the fusion bug and for any advice here. Ben On Wed, Sep 11, 2013 at 9:27 AM, Benjamin Rubin wrote: > Hi Carson, > > OK, I will try it and let you know how it goes. And thanks for the suggestion > about using always_complete as well. > > Thanks! > Ben > > > On Tue, Sep 10, 2013 at 9:45 PM, Carson Holt wrote: >> I think I have it fixed. Sorry it took so long, but my original fix actually >> created other odd behaviors so I had to track those down as well. >> >> You can download the test version with the fix by typing this on the command >> line --> >> >> svn co ********* >> >> user: ***** >> password: ***** >> >> Test it out and let me know. On the contig you sent me, I also set >> always_complete=1 as some of the hint based models were lacking start or stop >> codons. The results looked slightly better that way as well. >> >> Thanks, >> Carson >> >> >> >> From: Benjamin Rubin >> Date: Wednesday, September 4, 2013 10:07 AM >> To: Carson Holt >> >> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >> >> OK, great. Thanks for letting me know. >> >> Ben >> >> >> On Wed, Sep 4, 2013 at 9:00 AM, Carson Holt wrote: >>> I thought I'd give you an update on this. I've verified the bug and think >>> I've identified roughly where it's happening. I'll have a fix for you to >>> test soon. >>> >>> --Carson >>> >>> >>> From: Benjamin Rubin >>> >>> Date: Wednesday, August 28, 2013 4:16 PM >>> To: Carson Holt >>> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >>> >>> Hi Carson, >>> >>> OK, I think I uploaded all of the necessary files. I made a directory named >>> "rubin_data" for everything. I included both the full genome file >>> ("ec_patch...") as well as a file for scaffold_1. For this scaffold, I get >>> 132 genes when correct_est_fusion is off and 35 when it is on. These results >>> are after running maker a first time with correct_est_fusion on and >>> retraining SNAP/Augustus on the results. The SNAP file is >>> "gracilis_round_1.hmm" and I think the necessary Augustus files are in the >>> "gracilis_jaccard_flank100_corrfusion_round_1_results" directory. I also >>> included gff files for scaffold_1 with and without correct_est_fusion turned >>> on. >>> >>> Let me know if there is anything else that I failed to upload. I really >>> appreciate your time. Thanks so much. >>> >>> Ben >>> >>> >>> On Wed, Aug 28, 2013 at 9:59 AM, Benjamin Rubin >>> wrote: >>>> Hi Carson, >>>> >>>> Yes, I would be happy to upload the necessary data. Just let me know the >>>> connection information. >>>> >>>> Thanks! >>>> Ben >>>> >>>> >>>> On Wed, Aug 28, 2013 at 8:09 AM, Carson Holt wrote: >>>>> Could you pick one contig where the number of genes shift dramatically and >>>>> upload that contig fasta together with your control files and any evidence >>>>> datasets used to one of our servers (I'm going to send you connection >>>>> details in a separate e-mail). I can then run with and without >>>>> correct_est_fusion to see if there is anything unexpected going on. >>>>> >>>>> --Carson >>>>> >>>>> >>>>> >>>>> From: Benjamin Rubin >>>>> Date: Tuesday, August 27, 2013 10:59 AM >>>>> To: Carson Holt >>>>> Cc: >>>>> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >>>>> >>>>> Hi Carson, >>>>> >>>>> I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but >>>>> I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank >>>>> at 100). This is using only the first round with SNAP and Augustus trained >>>>> on the CEGMA genes. Is there anything else that I might be doing wrong? I >>>>> have attached my control file in case that could be useful. >>>>> >>>>> Thanks for the help! >>>>> Ben >>>>> >>>>> >>>>> On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: >>>>>> The correct_est_fusion option just clips UTR on overlapping genes. I >>>>>> suspect the real problem is setting pred_flank too low. If your lead in >>>>>> sequence to a gene is too short, ab initio predictors won't call it. So >>>>>> you are probably getting empty reports from SNAP/Augustus for the hint >>>>>> based predictions. Try increasing pred_flank to at least 150. Setting >>>>>> pred_flank too low will also limit how far MAKER will walk out along the >>>>>> edges initial alignments during the polishing step (exonerate). So >>>>>> setting it too low may also be causing you to lose some EST and protein >>>>>> alignments. >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> From: Benjamin Rubin >>>>>> Date: Monday, August 26, 2013 2:20 PM >>>>>> To: >>>>>> Subject: [maker-devel] Unexpected results with correct_est_fusion >>>>>> >>>>>> Hello developers, >>>>>> >>>>>> I am using MAKER 2.28 to annotate an ant genome. I provide protein >>>>>> sequence evidence from all seven of the other sequenced ant genomes and a >>>>>> de novo assembled transcriptome as EST evidence. I assembled the >>>>>> transcriptome using Trinity with the jaccard_clip option turned on to >>>>>> reduce gene fusions. Despite using this set of hopefully non-fused ESTs, >>>>>> I still have substantial fusion problems with the final annotation. >>>>>> Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. >>>>>> However, correct_est_fusion leads to the prediction of a much smaller >>>>>> number of genes (~5,000 instead of ~14,000). I am initially training both >>>>>> SNAP and Augustus using CEGMA genes and then retraining based on the >>>>>> first round of annotation. Both rounds of annotation yield the same low >>>>>> number (~5,000) of genes. It may also be worth mentioning that the number >>>>>> of exons is also far lower when using correct_est_fusion (~26,000 instead >>>>>> of ~90,000). >>>>>> >>>>>> Is this the expected behavior of correct_est_fusion? I was surprised that >>>>>> it reduced the predicted number of genes by such a large margin. I am >>>>>> concerned that I am using it incorrectly. Do you have any other >>>>>> suggestions for reducing gene merging? >>>>>> >>>>>> Thanks, >>>>>> Ben >>>>>> >>>>>> -- >>>>>> _____________________________________________________ >>>>>> Benjamin ER Rubin >>>>>> PhD Candidate >>>>>> Committee on Evolutionary Biology >>>>>> University of Chicago >>>>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>>>> >>>>>> Division of Insects >>>>>> Zoology Department >>>>>> Field Museum of Natural History >>>>>> 1400 South Lake Shore Drive >>>>>> Chicago, IL 60605 >>>>>> USA >>>>>> Office: (312) 665-7776 >>>>>> _______________________________________________ maker-devel mailing list >>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinf >>>>>> o/maker-devel_yandell-lab.org >>>>> >>>>> >>>>> >>>>> -- >>>>> _____________________________________________________ >>>>> Benjamin ER Rubin >>>>> PhD Candidate >>>>> Committee on Evolutionary Biology >>>>> University of Chicago >>>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>>> >>>>> Division of Insects >>>>> Zoology Department >>>>> Field Museum of Natural History >>>>> 1400 South Lake Shore Drive >>>>> Chicago, IL 60605 >>>>> USA >>>>> Office: (312) 665-7776 >>>> >>>> >>>> >>>> -- >>>> _____________________________________________________ >>>> Benjamin ER Rubin >>>> PhD Candidate >>>> Committee on Evolutionary Biology >>>> University of Chicago >>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>> >>>> Division of Insects >>>> Zoology Department >>>> Field Museum of Natural History >>>> 1400 South Lake Shore Drive >>>> Chicago, IL 60605 >>>> USA >>>> Office: (312) 665-7776 >>> >>> >>> >>> -- >>> _____________________________________________________ >>> Benjamin ER Rubin >>> PhD Candidate >>> Committee on Evolutionary Biology >>> University of Chicago >>> http://www.moreaulab.org/Benjamin_Rubin.html >>> >>> Division of Insects >>> Zoology Department >>> Field Museum of Natural History >>> 1400 South Lake Shore Drive >>> Chicago, IL 60605 >>> USA >>> Office: (312) 665-7776 >> >> >> >> -- >> _____________________________________________________ >> Benjamin ER Rubin >> PhD Candidate >> Committee on Evolutionary Biology >> University of Chicago >> benrubin.org >> >> Division of Insects >> Zoology Department >> Field Museum of Natural History >> 1400 South Lake Shore Drive >> Chicago, IL 60605 >> USA >> Office: (312) 665-7776 > > > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > benrubin.org > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From leshin at gmail.com Wed Sep 18 13:35:10 2013 From: leshin at gmail.com (Le-Shin Wu) Date: Wed, 18 Sep 2013 15:35:10 -0400 Subject: [maker-devel] running mpi MAKER Message-ID: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Hi, I am new to MAKER and just started to use MAKER for doing some genome annotations. I compiled MAKER package with mpi-supported configuration on our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER job, I got whole bunch of warring message as shown below in my error log file. I wonder is there anything wrong with this warring message? Thank you. (I request 64 processors on two nodes) STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. Best LW From carsonhh at gmail.com Wed Sep 18 14:27:32 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Sep 2013 14:27:32 -0600 Subject: [maker-devel] running mpi MAKER In-Reply-To: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Message-ID: It means either maker as not properly configured for MPI support, or the communication ring is not launching properly. Three things: 1. In the .../maker/src/ directory, run './Build status'. Does it say MPI_SUPPORT is configured or installed? 2. Run 'which mpiexec' on the command line? What is the path? Is is MPICH2 mpiexec, or OpenMPI, or something else? 3. Run 'mpiexec -n 64 -hostfile $PBS_NODEFILE hostname' on the command line. What does it print out? Thanks, Carson On 9/18/13 1:35 PM, "Le-Shin Wu" wrote: >Hi, > >I am new to MAKER and just started to use MAKER for doing some genome >annotations. I compiled MAKER package with mpi-supported configuration on >our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker >maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER >job, I got whole bunch of warring message as shown below in my error log >file. I wonder is there anything wrong with this warring message? Thank >you. (I request 64 processors on two nodes) > >STATUS: Processing and indexing input FASTA files... >WARNING: Multiple MAKER processes have been started in the >same directory. > > >Best > >LW >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From lewu at indiana.edu Wed Sep 18 19:30:49 2013 From: lewu at indiana.edu (Le-shin Wu) Date: Wed, 18 Sep 2013 21:30:49 -0400 Subject: [maker-devel] running mpi MAKER In-Reply-To: References: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Message-ID: Hi Carson, Thanks a lot for your information. When I run './Build status', it shows as below and looks like MPI SUPPORT is enabled. ============================================================================== STATUS MAKER 2.27 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: ENABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK But when I run 'which mpiexec' it shows "/N/soft/mason/openmpi/1.5.4/gcc/bin/mpiexec". So I think I did not use the correct version of mpiexec while running my MAKER job. Thanks again. I will try my MAKER job again with the correct mpiexec from mpich2. Best LW ____________________________________________ Le-Shin Wu Center for Computational Cytomics, Indiana University http://www.cs.indiana.edu/~lewu ____________________________________________ On Wed, Sep 18, 2013 at 4:27 PM, Carson Holt wrote: > It means either maker as not properly configured for MPI support, or the > communication ring is not launching properly. > > Three things: > 1. In the .../maker/src/ directory, run './Build status'. Does it say > MPI_SUPPORT is configured or installed? > 2. Run 'which mpiexec' on the command line? What is the path? Is is > MPICH2 mpiexec, or OpenMPI, or something else? > 3. Run 'mpiexec -n 64 -hostfile $PBS_NODEFILE hostname' on the command > line. What does it print out? > > Thanks, > Carson > > > On 9/18/13 1:35 PM, "Le-Shin Wu" wrote: > > >Hi, > > > >I am new to MAKER and just started to use MAKER for doing some genome > >annotations. I compiled MAKER package with mpi-supported configuration on > >our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker > >maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER > >job, I got whole bunch of warring message as shown below in my error log > >file. I wonder is there anything wrong with this warring message? Thank > >you. (I request 64 processors on two nodes) > > > >STATUS: Processing and indexing input FASTA files... > >WARNING: Multiple MAKER processes have been started in the > >same directory. > > > > > >Best > > > >LW > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhinsley at ebi.ac.uk Thu Sep 19 09:37:17 2013 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Thu, 19 Sep 2013 16:37:17 +0100 Subject: [maker-devel] 2.27 and 2.28 incompatible Message-ID: <523B1A2D.7020300@ebi.ac.uk> To try to fix SQL lock file errors I installed 2.28 and made the mistake of running on a directory made by 2.27 (to run snap and augustus for the first time). Every contig fails due to errors like: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't open file 04723bc0c22478764d90bbaebca96d23 STACK: Error::throw STACK: Bio::Root::Root::throw /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/Root/Root.pm:472 STACK: Bio::DB::Fasta::fh /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:948 STACK: Bio::DB::Fasta::subseq /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:929 STACK: Bio::PrimarySeq::Fasta::seq /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:1089 STACK: FastaSeq::seq /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/FastaSeq.pm:50 STACK: Process::MpiChunk::_go /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:478 STACK: Process::MpiChunk::run /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiTiers.pm:286 STACK: /nfs/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/maker:667 ----------------------------------------------------------- --> rank=NA, hostname=ebi3-198.ebi.ac.uk at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Error.pm line 38 Error::_throw_Error_Simple('HASH(0x388cb78)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../ lib/Error.pm line 306 Error::subs::run_clauses('HASH(0x388cbf0)', '\x{a}------------- EXCEPTION: Bio::Root::Exception -------------\x{a}...', undef, 'ARRAY(0x38a0d18)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x38f93f8)', 'HASH(0x388cbf0)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich 2/bin/../lib/FastaSeq.pm line 95 FastaSeq::seq('FastaSeq=HASH(0x388dda0)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/ Process/MpiChunk.pm line 478 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x38a0e50)', 'run', 'HASH(0x38a0ec8)', 0, 0) called at /nfs/production/panda/ens emblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run('Process::MpiChunk=HASH(0x38a0e50)', 0) called at /nfs/production/panda/ensemblgenomes/external/maker/2. 28_mpich2/bin/../lib/Process/MpiChunk.pm line 357 Process::MpiChunk::run_all('Process::MpiChunk=HASH(0x38a0e50)', 0) called at /nfs/production/panda/ensemblgenomes/external/make r/2.28_mpich2/bin/../lib/Process/MpiTiers.pm line 286 Process::MpiTiers::run_all('Process::MpiTiers=HASH(0x3867960)', 0) called at /nfs/panda/ensemblgenomes/external/maker/2.28_mpic h2/bin/maker line 667 Is there an easy to reset the datastore/ file names so that i can switch over to 2.28 without starting over? (eg maker -dsindex) I killed the job and ran 2.27 instead which seems to be jim dandy. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom From carsonhh at gmail.com Thu Sep 19 10:06:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Sep 2013 10:06:09 -0600 Subject: [maker-devel] 2.27 and 2.28 incompatible In-Reply-To: <523B1A2D.7020300@ebi.ac.uk> Message-ID: There is something very odd, because I've never seen those errors before, and 2.28 should use the same datastore structure as 2.27. I'm going to write a script that will print out certain configuration information about your install that might help me see what's going on. My plane is boarding now, so I'll send it to you later this evening. Thanks, Carson On 9/19/13 9:37 AM, "Malcolm Hinsley" wrote: >To try to fix SQL lock file errors I installed 2.28 and made the mistake >of running on a directory made by 2.27 (to run snap and augustus for the >first time). > >Every contig fails due to errors like: > > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Can't open file 04723bc0c22478764d90bbaebca96d23 >STACK: Error::throw >STACK: Bio::Root::Root::throw >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/Root/Root. >pm:472 >STACK: Bio::DB::Fasta::fh >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:948 >STACK: Bio::DB::Fasta::subseq >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:929 >STACK: Bio::PrimarySeq::Fasta::seq >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:1089 >STACK: FastaSeq::seq >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/FastaSeq.pm:50 >STACK: Process::MpiChunk::_go >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:478 >STACK: Process::MpiChunk::run >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:341 >STACK: Process::MpiChunk::run_all >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:357 >STACK: Process::MpiTiers::run_all >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiTiers.pm:286 >STACK: /nfs/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/maker:667 >----------------------------------------------------------- >--> rank=NA, hostname=ebi3-198.ebi.ac.uk > at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Error.pm >line 38 > Error::_throw_Error_Simple('HASH(0x388cb78)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../ >lib/Error.pm line 306 > Error::subs::run_clauses('HASH(0x388cbf0)', '\x{a}------------- >EXCEPTION: Bio::Root::Exception -------------\x{a}...', undef, >'ARRAY(0x38a0d18)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Error.pm >line 426 > Error::subs::try('CODE(0x38f93f8)', 'HASH(0x388cbf0)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich >2/bin/../lib/FastaSeq.pm line 95 > FastaSeq::seq('FastaSeq=HASH(0x388dda0)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/ >Process/MpiChunk.pm line 478 > Process::MpiChunk::_go('Process::MpiChunk=HASH(0x38a0e50)', >'run', 'HASH(0x38a0ec8)', 0, 0) called at /nfs/production/panda/ens >emblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm >line 341 > Process::MpiChunk::run('Process::MpiChunk=HASH(0x38a0e50)', 0) >called at /nfs/production/panda/ensemblgenomes/external/maker/2. >28_mpich2/bin/../lib/Process/MpiChunk.pm line 357 > Process::MpiChunk::run_all('Process::MpiChunk=HASH(0x38a0e50)', >0) called at /nfs/production/panda/ensemblgenomes/external/make >r/2.28_mpich2/bin/../lib/Process/MpiTiers.pm line 286 > Process::MpiTiers::run_all('Process::MpiTiers=HASH(0x3867960)', >0) called at /nfs/panda/ensemblgenomes/external/maker/2.28_mpic >h2/bin/maker line 667 > >Is there an easy to reset the datastore/ file names so that i can switch >over to 2.28 without starting over? (eg maker -dsindex) >I killed the job and ran 2.27 instead which seems to be jim dandy. > >-- >malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 >European Bioinformatics Institute (EMBL-EBI) >European Molecular Biology Laboratory >Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD >United Kingdom > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From myandell at genetics.utah.edu Thu Sep 19 11:53:48 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Thu, 19 Sep 2013 17:53:48 +0000 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: References: Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Hi Corban & Xia, I've forwarded your question along to the MAKER_dev list, were you can get speedy answers to your maker related questions. Thanks for using MAKER. --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] Sent: Thursday, September 19, 2013 11:49 AM To: Mark Yandell; Corban-Gregory.Rivera at dupont.com Subject: maker2 scripts for functional annotation Dr. Yandell, We were recently evaluating maker2 for annotation and going through the maker tutorial from 2012. http://gmod.org/wiki/MAKER_Tutorial_2012 The tutorial makes references to some scripts that we couldn?t find in the current release. We were looking for scripts like gff3_preds2models to convert match/match_part format into annotations with gene/mRNA/exons/CDS and others. I was wondering if maybe we did not have the most up to date version. In addition to getting accurate gene annotations, I was looking for a solution to get functional assignments. I see that there are some scripts like maker_functional_fasta that may help, but I was wondering what you would recommend. Thanks, Corban & Xia This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. The dupont.com web address will continue in use for a transitional period for communications sent or received on behalf of DuPont Performance Coatings., which is not affiliated in any way with the DuPont Company. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From carsonhh at gmail.com Thu Sep 19 15:58:16 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Sep 2013 15:58:16 -0600 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Message-ID: Hello Corban & Xia, Some scripts like gff3_preds2models are deprecated. To get the same result as was offered by gff3_preds2models, just give your match/match_part features to pref_gff= in the maker_opts.ctl file, set keep_preds=1, and run with all other options and predictors turned off. The final MAKER result will be your match/match_part features converted into gene/mRNA/exons/CDS. For functional annotation, you can use Interproscan, BLASTP against UniProt, or BALST2GO. My preference is to use InterProScan to add GO terms and proteins domains via the ipr_update_gff and iprscan2gff3 scripts. Then add putative gene functions via BLASTP to UniProt and maker_functional_fasta and maker_functional_gff scripts. Go ahead and take a look and that those tools and let me know if you have any questions or need help you configuring them. Thanks, Carson On 9/19/13 11:53 AM, "Mark Yandell" wrote: >Hi Corban & Xia, > > >I've forwarded your question along to the MAKER_dev list, were you can >get speedy answers to your maker related questions. Thanks for using >MAKER. > >--mark > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >Sent: Thursday, September 19, 2013 11:49 AM >To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >Subject: maker2 scripts for functional annotation > >Dr. Yandell, > >We were recently evaluating maker2 for annotation and going through the >maker tutorial from 2012. > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >The tutorial makes references to some scripts that we couldn?t find in >the current release. We were looking for scripts like gff3_preds2models >to convert match/match_part format into annotations with >gene/mRNA/exons/CDS and others. I was wondering if maybe we did not have >the most up to date version. > >In addition to getting accurate gene annotations, I was looking for a >solution to get functional assignments. I see that there are some >scripts like maker_functional_fasta that may help, but I was wondering >what you would recommend. > >Thanks, > >Corban & Xia > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender by >return e-mail and delete this e-mail from your system. Unless explicitly >and conspicuously designated as "E-Contract Intended", this e-mail does >not constitute a contract offer, a contract amendment, or an acceptance >of a contract offer. This e-mail does not constitute a consent to the >use of sender's contact information for direct marketing purposes or for >transfers of data to third parties. > >The dupont.com web address will continue in use for a >transitional period for communications sent or received on behalf of >DuPont >Performance Coatings., which is not affiliated in any way with the DuPont >Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean > > http://www.DuPont.com/corp/email_disclaimer.html > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From graham.etherington at sainsbury-laboratory.ac.uk Wed Sep 25 05:49:40 2013 From: graham.etherington at sainsbury-laboratory.ac.uk (graham etherington (TSL)) Date: Wed, 25 Sep 2013 11:49:40 +0000 Subject: [maker-devel] Path and contents of RepBase Message-ID: Hi, I'm getting the following error when I run maker v2.28: WARNING: RepBase is not installed for RepeatMasker. This limits RepeatMasker's functionality and makes the model_org option in the control files virtually meaningless. MAKER will now reconfigure for simple repeat masking only. In maker_opts.clt I have: model_org=all In maker_exe.ctl I have: RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker Instructions in the GMOD maker tutorial state: "Unpack the contents of the RepBase tarball into the RepeatMasker/Libraries directory." So, I have RepBase located as follows: /RepeatMasker/4.0.3/x86_64/bin/Libraries/ The content of this directory is: RepBase18.08.embl/ RepBase18.08.fasta/ Could someone tell me how/where maker looks for REPBase and which files (embl? fasta? something else?) I need in there? Many thanks for your help, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 From carsonhh at gmail.com Wed Sep 25 08:13:40 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:13:40 -0400 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just letting you know verbally that you don't have it installed, so you are not surprised by the lack of results RepeatMasker gives you. You must Download RepBase separately from Repeatmasker. When you unpack it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file as well as other files in the .../RepeatMasker/Libraries/ directory. The header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will tell you if it is the minimal library or the full RepBase library. You have also downloaded the incorrect format since you have directories named RepBase18.08.embl. You need to go to http://www.girinst.org/server/RepBase/index.php and download the RepeatMasker edition and not the EMBL format one. The contents should be named exactly .../Libraries/RepeatMaskerLib.embl. Here is a direct link --> http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repea tmaskerlibraries-20130422.tar.gz Make sure you are in the .../RepeatMasker/ directory before unpacking the tar ball, or you won't get the proper file replacement behavior. See Repeatmasker installation instructions here --> http://www.repeatmasker.org/RMDownload.html Thanks, Carson On 9/25/13 7:49 AM, "graham etherington (TSL)" wrote: >Hi, >I'm getting the following error when I run maker v2.28: >WARNING: RepBase is not installed for RepeatMasker. This limits >RepeatMasker's functionality and makes the model_org option in the >control files virtually meaningless. MAKER will now reconfigure >for simple repeat masking only. > > > >In maker_opts.clt I have: >model_org=all >In maker_exe.ctl I have: >RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker > >Instructions in the GMOD maker tutorial state: >"Unpack the contents of the RepBase tarball into the >RepeatMasker/Libraries directory." > > >So, I have RepBase located as follows: > >/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >The content of this directory is: >RepBase18.08.embl/ >RepBase18.08.fasta/ > >Could someone tell me how/where maker looks for REPBase and which files >(embl? fasta? something else?) I need in there? > >Many thanks for your help, >Graham > > >Dr. Graham Etherington >Bioinformatics Support Officer, >The Sainsbury Laboratory, >Norwich Research Park, >Norwich NR4 7UH. >UK >Tel: +44 (0)1603 450601 > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From graham.etherington at sainsbury-laboratory.ac.uk Wed Sep 25 08:29:53 2013 From: graham.etherington at sainsbury-laboratory.ac.uk (graham etherington (TSL)) Date: Wed, 25 Sep 2013 14:29:53 +0000 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: Hi Carson, Many thanks for the explanation of how RepBase works. I followed your instructions and maker no longer complains. Thanks for your help, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 On 25/09/2013 15:13, "Carson Holt" wrote: >It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just >letting you know verbally that you don't have it installed, so you are not >surprised by the lack of results RepeatMasker gives you. > >You must Download RepBase separately from Repeatmasker. When you unpack >it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file >as well as other files in the .../RepeatMasker/Libraries/ directory. The >header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will >tell you if it is the minimal library or the full RepBase library. > >You have also downloaded the incorrect format since you have directories >named RepBase18.08.embl. You need to go to >http://www.girinst.org/server/RepBase/index.php and download the >RepeatMasker edition and not the EMBL format one. The contents should be >named exactly .../Libraries/RepeatMaskerLib.embl. > >Here is a direct link --> >http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repe >a >tmaskerlibraries-20130422.tar.gz > > >Make sure you are in the .../RepeatMasker/ directory before unpacking the >tar ball, or you won't get the proper file replacement behavior. > >See Repeatmasker installation instructions here --> >http://www.repeatmasker.org/RMDownload.html > >Thanks, >Carson > > > >On 9/25/13 7:49 AM, "graham etherington (TSL)" > wrote: > >>Hi, >>I'm getting the following error when I run maker v2.28: >>WARNING: RepBase is not installed for RepeatMasker. This limits >>RepeatMasker's functionality and makes the model_org option in the >>control files virtually meaningless. MAKER will now reconfigure >>for simple repeat masking only. >> >> >> >>In maker_opts.clt I have: >>model_org=all >>In maker_exe.ctl I have: >>RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker >> >>Instructions in the GMOD maker tutorial state: >>"Unpack the contents of the RepBase tarball into the >>RepeatMasker/Libraries directory." >> >> >>So, I have RepBase located as follows: >> >>/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >>The content of this directory is: >>RepBase18.08.embl/ >>RepBase18.08.fasta/ >> >>Could someone tell me how/where maker looks for REPBase and which files >>(embl? fasta? something else?) I need in there? >> >>Many thanks for your help, >>Graham >> >> >>Dr. Graham Etherington >>Bioinformatics Support Officer, >>The Sainsbury Laboratory, >>Norwich Research Park, >>Norwich NR4 7UH. >>UK >>Tel: +44 (0)1603 450601 >> >> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Wed Sep 25 08:32:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:32:33 -0400 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: Glad it worked. If you have any other question, just let us know. Thanks, Carson On 9/25/13 10:29 AM, "graham etherington (TSL)" wrote: >Hi Carson, >Many thanks for the explanation of how RepBase works. I followed your >instructions and maker no longer complains. >Thanks for your help, >Graham > >Dr. Graham Etherington >Bioinformatics Support Officer, >The Sainsbury Laboratory, >Norwich Research Park, >Norwich NR4 7UH. >UK >Tel: +44 (0)1603 450601 > > > > > >On 25/09/2013 15:13, "Carson Holt" wrote: > >>It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just >>letting you know verbally that you don't have it installed, so you are >>not >>surprised by the lack of results RepeatMasker gives you. >> >>You must Download RepBase separately from Repeatmasker. When you unpack >>it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file >>as well as other files in the .../RepeatMasker/Libraries/ directory. The >>header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will >>tell you if it is the minimal library or the full RepBase library. >> >>You have also downloaded the incorrect format since you have directories >>named RepBase18.08.embl. You need to go to >>http://www.girinst.org/server/RepBase/index.php and download the >>RepeatMasker edition and not the EMBL format one. The contents should be >>named exactly .../Libraries/RepeatMaskerLib.embl. >> >>Here is a direct link --> >>http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/rep >>e >>a >>tmaskerlibraries-20130422.tar.gz >> >> >>Make sure you are in the .../RepeatMasker/ directory before unpacking the >>tar ball, or you won't get the proper file replacement behavior. >> >>See Repeatmasker installation instructions here --> >>http://www.repeatmasker.org/RMDownload.html >> >>Thanks, >>Carson >> >> >> >>On 9/25/13 7:49 AM, "graham etherington (TSL)" >> wrote: >> >>>Hi, >>>I'm getting the following error when I run maker v2.28: >>>WARNING: RepBase is not installed for RepeatMasker. This limits >>>RepeatMasker's functionality and makes the model_org option in the >>>control files virtually meaningless. MAKER will now reconfigure >>>for simple repeat masking only. >>> >>> >>> >>>In maker_opts.clt I have: >>>model_org=all >>>In maker_exe.ctl I have: >>>RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker >>> >>>Instructions in the GMOD maker tutorial state: >>>"Unpack the contents of the RepBase tarball into the >>>RepeatMasker/Libraries directory." >>> >>> >>>So, I have RepBase located as follows: >>> >>>/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >>>The content of this directory is: >>>RepBase18.08.embl/ >>>RepBase18.08.fasta/ >>> >>>Could someone tell me how/where maker looks for REPBase and which files >>>(embl? fasta? something else?) I need in there? >>> >>>Many thanks for your help, >>>Graham >>> >>> >>>Dr. Graham Etherington >>>Bioinformatics Support Officer, >>>The Sainsbury Laboratory, >>>Norwich Research Park, >>>Norwich NR4 7UH. >>>UK >>>Tel: +44 (0)1603 450601 >>> >>> >>> >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Wed Sep 25 08:35:46 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:35:46 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: If it is launching predictors then you have snap hmm or augustus_species set. You ned to blank out all other options in the control files (including repeat masking options, proteins, ESTs, etc.) when trying to convert mathc/match_part to gene/mRNA/exons/CDS, or else those other programs will run. --Carson On 9/25/13 10:31 AM, "Xia.Cao at dupont.com" wrote: >Hi Carson, > >Thank you for the message and your kind help. We tested maker2 by setting >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >seemed maker2 started to launch all predictors again and it took long >time to finish. I wonder if there is any way that we can directly get >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >match/match_part features into gene/mRNA/exons/CDS. > >Thanks, >Xia > >-----Original Message----- >From: Carson Holt [mailto:carsonhh at gmail.com] >Sent: Thursday, September 19, 2013 5:58 PM >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >maker-devel at yandell-lab.org >Subject: Re: [maker-devel] maker2 scripts for functional annotation > >Hello Corban & Xia, > >Some scripts like gff3_preds2models are deprecated. To get the same >result as was offered by gff3_preds2models, just give your >match/match_part features to pref_gff= in the maker_opts.ctl file, set >keep_preds=1, and run with all other options and predictors turned off. >The final MAKER result will be your match/match_part features converted >into gene/mRNA/exons/CDS. > >For functional annotation, you can use Interproscan, BLASTP against >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >scripts. Then add putative gene functions via BLASTP to UniProt and >maker_functional_fasta and maker_functional_gff scripts. > >Go ahead and take a look and that those tools and let me know if you have >any questions or need help you configuring them. > >Thanks, >Carson > > >On 9/19/13 11:53 AM, "Mark Yandell" wrote: > >>Hi Corban & Xia, >> >> >>I've forwarded your question along to the MAKER_dev list, were you can >>get speedy answers to your maker related questions. Thanks for using >>MAKER. >> >>--mark >> >> >>Mark Yandell >>Professor of Human Genetics >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>Human Genetics University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>ph:801-587-7707 >> >>________________________________________ >>From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >>Sent: Thursday, September 19, 2013 11:49 AM >>To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >>Subject: maker2 scripts for functional annotation >> >>Dr. Yandell, >> >>We were recently evaluating maker2 for annotation and going through the >>maker tutorial from 2012. >> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >> >>The tutorial makes references to some scripts that we couldn?t find in >>the current release. We were looking for scripts like >>gff3_preds2models to convert match/match_part format into annotations >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>not have the most up to date version. >> >>In addition to getting accurate gene annotations, I was looking for a >>solution to get functional assignments. I see that there are some >>scripts like maker_functional_fasta that may help, but I was wondering >>what you would recommend. >> >>Thanks, >> >>Corban & Xia >> >>This communication is for use by the intended recipient and contains >>information that may be Privileged, confidential or copyrighted under >>applicable law. If you are not the intended recipient, you are hereby >>formally notified that any use, copying or distribution of this e-mail, >>in whole or in part, is strictly prohibited. Please notify the sender >>by return e-mail and delete this e-mail from your system. Unless >>explicitly and conspicuously designated as "E-Contract Intended", this >>e-mail does not constitute a contract offer, a contract amendment, or >>an acceptance of a contract offer. This e-mail does not constitute a >>consent to the use of sender's contact information for direct marketing >>purposes or for transfers of data to third parties. >> >>The dupont.com web address will continue in use for a transitional >>period for communications sent or received on behalf of DuPont >>Performance Coatings., which is not affiliated in any way with the >>DuPont Company. >> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>Korean >> >> http://www.DuPont.com/corp/email_disclaimer.html >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender by >return e-mail and delete this e-mail from your system. Unless explicitly >and conspicuously designated as "E-Contract Intended", this e-mail does >not constitute a contract offer, a contract amendment, or an acceptance >of a contract offer. This e-mail does not constitute a consent to the >use of sender's contact information for direct marketing purposes or for >transfers of data to third parties. > >The dupont.com web address will continue in use for a >transitional period for communications sent or received on behalf of >DuPont >Performance Coatings., which is not affiliated in any way with the DuPont >Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean > > http://www.DuPont.com/corp/email_disclaimer.html > From Xia.Cao at dupont.com Wed Sep 25 08:31:25 2013 From: Xia.Cao at dupont.com (Xia.Cao at dupont.com) Date: Wed, 25 Sep 2013 14:31:25 +0000 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: References: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Message-ID: Hi Carson, Thank you for the message and your kind help. We tested maker2 by setting keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it seemed maker2 started to launch all predictors again and it took long time to finish. I wonder if there is any way that we can directly get gene/mRNA/exons/CDS gff file without re-running maker2 to convert match/match_part features into gene/mRNA/exons/CDS. Thanks, Xia -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, September 19, 2013 5:58 PM To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker2 scripts for functional annotation Hello Corban & Xia, Some scripts like gff3_preds2models are deprecated. To get the same result as was offered by gff3_preds2models, just give your match/match_part features to pref_gff= in the maker_opts.ctl file, set keep_preds=1, and run with all other options and predictors turned off. The final MAKER result will be your match/match_part features converted into gene/mRNA/exons/CDS. For functional annotation, you can use Interproscan, BLASTP against UniProt, or BALST2GO. My preference is to use InterProScan to add GO terms and proteins domains via the ipr_update_gff and iprscan2gff3 scripts. Then add putative gene functions via BLASTP to UniProt and maker_functional_fasta and maker_functional_gff scripts. Go ahead and take a look and that those tools and let me know if you have any questions or need help you configuring them. Thanks, Carson On 9/19/13 11:53 AM, "Mark Yandell" wrote: >Hi Corban & Xia, > > >I've forwarded your question along to the MAKER_dev list, were you can >get speedy answers to your maker related questions. Thanks for using >MAKER. > >--mark > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >Human Genetics University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >Sent: Thursday, September 19, 2013 11:49 AM >To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >Subject: maker2 scripts for functional annotation > >Dr. Yandell, > >We were recently evaluating maker2 for annotation and going through the >maker tutorial from 2012. > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >The tutorial makes references to some scripts that we couldn?t find in >the current release. We were looking for scripts like >gff3_preds2models to convert match/match_part format into annotations >with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >not have the most up to date version. > >In addition to getting accurate gene annotations, I was looking for a >solution to get functional assignments. I see that there are some >scripts like maker_functional_fasta that may help, but I was wondering >what you would recommend. > >Thanks, > >Corban & Xia > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender >by return e-mail and delete this e-mail from your system. Unless >explicitly and conspicuously designated as "E-Contract Intended", this >e-mail does not constitute a contract offer, a contract amendment, or >an acceptance of a contract offer. This e-mail does not constitute a >consent to the use of sender's contact information for direct marketing >purposes or for transfers of data to third parties. > >The dupont.com web address will continue in use for a transitional >period for communications sent or received on behalf of DuPont >Performance Coatings., which is not affiliated in any way with the >DuPont Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese >Korean > > http://www.DuPont.com/corp/email_disclaimer.html > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. The dupont.com web address will continue in use for a transitional period for communications sent or received on behalf of DuPont Performance Coatings., which is not affiliated in any way with the DuPont Company. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From Ambrose.Andongabo at rothamsted.ac.uk Thu Sep 26 05:23:13 2013 From: Ambrose.Andongabo at rothamsted.ac.uk (Ambrose Andongabo (RRes-Roth)) Date: Thu, 26 Sep 2013 11:23:13 +0000 Subject: [maker-devel] Using RNA-seq data from tophat/cufflinks in maker Message-ID: Dear Carson, I have been successfully running the MAKER pipeline trying to improve gene annotations. Strangely after trying to visualize my data in GBrowse I noticed that although my density and coverage plots and even raw read plots show clearly that there is a gene feature in a particular region(confirmed by the cufflinks track), this is not called by MAKER and thus not improving my annotation as I expected. I think the problem starts where I converted the cufflinks gtf files to gff3 using the script you provided(cufflinks2gff3). I will be please if you can be of any help trying to explain how I can perform the conversion so that it looks like a proper gff3 file that maker will then use to instruct the gene predictors Many thanks in advance Ambrose -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. From carsonhh at gmail.com Fri Sep 27 04:48:29 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Sep 2013 06:48:29 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: From: Carson Holt Date: Friday, September 27, 2013 6:42 AM To: Subject: Re: [maker-devel] maker2 scripts for functional annotation If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins). If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.). If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run. Thanks, Carson From: Date: Friday, September 27, 2013 4:34 AM To: Carson Holt Subject: Re: [maker-devel] maker2 scripts for functional annotation Hi... Xia and Carson I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure. The CEGMA output is a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not)) If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source. AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses. Hope this helps and apologies if i'm wrong! On Wednesday, 25 September 2013 15:35:46 UTC+1, Carson Holt wrote: > If it is launching predictors then you have snap hmm or augustus_species > set. You ned to blank out all other options in the control files > (including repeat masking options, proteins, ESTs, etc.) when trying to > convert mathc/match_part to gene/mRNA/exons/CDS, or else those other > programs will run. > > --Carson > > > On 9/25/13 10:31 AM, "Xia... at dupont.com " > wrote: > >> >Hi Carson, >> > >> >Thank you for the message and your kind help. We tested maker2 by setting >> >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >> >seemed maker2 started to launch all predictors again and it took long >> >time to finish. I wonder if there is any way that we can directly get >> >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >> >match/match_part features into gene/mRNA/exons/CDS. >> > >> >Thanks, >> >Xia >> > >> >-----Original Message----- >> >From: Carson Holt [mailto:cars... at gmail.com ] >> >Sent: Thursday, September 19, 2013 5:58 PM >> >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >> >maker... at yandell-lab.org >> >Subject: Re: [maker-devel] maker2 scripts for functional annotation >> > >> >Hello Corban & Xia, >> > >> >Some scripts like gff3_preds2models are deprecated. To get the same >> >result as was offered by gff3_preds2models, just give your >> >match/match_part features to pref_gff= in the maker_opts.ctl file, set >> >keep_preds=1, and run with all other options and predictors turned off. >> >The final MAKER result will be your match/match_part features converted >> >into gene/mRNA/exons/CDS. >> > >> >For functional annotation, you can use Interproscan, BLASTP against >> >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >> >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >> >scripts. Then add putative gene functions via BLASTP to UniProt and >> >maker_functional_fasta and maker_functional_gff scripts. >> > >> >Go ahead and take a look and that those tools and let me know if you have >> >any questions or need help you configuring them. >> > >> >Thanks, >> >Carson >> > >> > >> >On 9/19/13 11:53 AM, "Mark Yandell" >> > wrote: >> > >>> >>Hi Corban & Xia, >>> >> >>> >> >>> >>I've forwarded your question along to the MAKER_dev list, were you can >>> >>get speedy answers to your maker related questions. Thanks for using >>> >>MAKER. >>> >> >>> >>--mark >>> >> >>> >> >>> >>Mark Yandell >>> >>Professor of Human Genetics >>> >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>> >>Human Genetics University of Utah >>> >>15 North 2030 East, Room 2100 >>> >>Salt Lake City, UT 84112-5330 >>> >>ph:801-587-7707 >>> >> >>> >>________________________________________ >>> >>From: Xia... at dupont.com [Xia... at dupont.com ] >>> >>Sent: Thursday, September 19, 2013 11:49 AM >>> >>To: Mark Yandell; Corban-Gre... at dupont.com >>> >>Subject: maker2 scripts for functional annotation >>> >> >>> >>Dr. Yandell, >>> >> >>> >>We were recently evaluating maker2 for annotation and going through the >>> >>maker tutorial from 2012. >>> >> >>> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >>> >> >>> >>The tutorial makes references to some scripts that we couldn?t find in >>> >>the current release. We were looking for scripts like >>> >>gff3_preds2models to convert match/match_part format into annotations >>> >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>> >>not have the most up to date version. >>> >> >>> >>In addition to getting accurate gene annotations, I was looking for a >>> >>solution to get functional assignments. I see that there are some >>> >>scripts like maker_functional_fasta that may help, but I was wondering >>> >>what you would recommend. >>> >> >>> >>Thanks, >>> >> >>> >>Corban & Xia >>> >> >>> >>This communication is for use by the intended recipient and contains >>> >>information that may be Privileged, confidential or copyrighted under >>> >>applicable law. If you are not the intended recipient, you are hereby >>> >>formally notified that any use, copying or distribution of this e-mail, >>> >>in whole or in part, is strictly prohibited. Please notify the sender >>> >>by return e-mail and delete this e-mail from your system. Unless >>> >>explicitly and conspicuously designated as "E-Contract Intended", this >>> >>e-mail does not constitute a contract offer, a contract amendment, or >>> >>an acceptance of a contract offer. This e-mail does not constitute a >>> >>consent to the use of sender's contact information for direct marketing >>> >>purposes or for transfers of data to third parties. >>> >> >>> >>The dupont.com web address will continue in use for a >>> transitional >>> >>period for communications sent or received on behalf of DuPont >>> >>Performance Coatings., which is not affiliated in any way with the >>> >>DuPont Company. >>> >> >>> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>> >>Korean >>> >> >>> >> http://www.DuPont.com/corp/email_disclaimer.html >>> >> >>> >>_______________________________________________ >>> >>maker-devel mailing list >>> >>maker... at box290.bluehost.com >>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > >> > >> > >> >This communication is for use by the intended recipient and contains >> >information that may be Privileged, confidential or copyrighted under >> >applicable law. If you are not the intended recipient, you are hereby >> >formally notified that any use, copying or distribution of this e-mail, >> >in whole or in part, is strictly prohibited. Please notify the sender by >> >return e-mail and delete this e-mail from your system. Unless explicitly >> >and conspicuously designated as "E-Contract Intended", this e-mail does >> >not constitute a contract offer, a contract amendment, or an acceptance >> >of a contract offer. This e-mail does not constitute a consent to the >> >use of sender's contact information for direct marketing purposes or for >> >transfers of data to third parties. >> > >> >The dupont.com web address will >> continue in use for a >> >transitional period for communications sent or received on behalf of >> >DuPont >> >Performance Coatings., which is not affiliated in any way with the DuPont >> >Company. >> > >> >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean >> > >> > http://www.DuPont.com/corp/email_disclaimer.html >> > > > _______________________________________________ > maker-devel mailing list > maker... at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 27 04:48:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Sep 2013 06:48:52 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: So to give a little background to this, the question was how to turn match/match_part into gene/mRNA/exon/CDS like the old gff3_preds2models script. The steps below will basically just turn maker into a feature type converter and ignore all it's other capabilities. That being said, depending on what your final goal is, you might actually want to be running something a different way, but if your only goal is to blindly convert feature types, then those steps will work. Thanks, Carson From: Carson Holt Date: Friday, September 27, 2013 6:42 AM To: Subject: Re: [maker-devel] maker2 scripts for functional annotation If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins). If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.). If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run. Thanks, Carson From: Date: Friday, September 27, 2013 4:34 AM To: Carson Holt Subject: Re: [maker-devel] maker2 scripts for functional annotation Hi... Xia and Carson I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure. The CEGMA output is a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not)) If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source. AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses. Hope this helps and apologies if i'm wrong! On Wednesday, 25 September 2013 15:35:46 UTC+1, Carson Holt wrote: > If it is launching predictors then you have snap hmm or augustus_species > set. You ned to blank out all other options in the control files > (including repeat masking options, proteins, ESTs, etc.) when trying to > convert mathc/match_part to gene/mRNA/exons/CDS, or else those other > programs will run. > > --Carson > > > On 9/25/13 10:31 AM, "Xia... at dupont.com " > wrote: > >> >Hi Carson, >> > >> >Thank you for the message and your kind help. We tested maker2 by setting >> >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >> >seemed maker2 started to launch all predictors again and it took long >> >time to finish. I wonder if there is any way that we can directly get >> >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >> >match/match_part features into gene/mRNA/exons/CDS. >> > >> >Thanks, >> >Xia >> > >> >-----Original Message----- >> >From: Carson Holt [mailto:cars... at gmail.com ] >> >Sent: Thursday, September 19, 2013 5:58 PM >> >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >> >maker... at yandell-lab.org >> >Subject: Re: [maker-devel] maker2 scripts for functional annotation >> > >> >Hello Corban & Xia, >> > >> >Some scripts like gff3_preds2models are deprecated. To get the same >> >result as was offered by gff3_preds2models, just give your >> >match/match_part features to pref_gff= in the maker_opts.ctl file, set >> >keep_preds=1, and run with all other options and predictors turned off. >> >The final MAKER result will be your match/match_part features converted >> >into gene/mRNA/exons/CDS. >> > >> >For functional annotation, you can use Interproscan, BLASTP against >> >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >> >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >> >scripts. Then add putative gene functions via BLASTP to UniProt and >> >maker_functional_fasta and maker_functional_gff scripts. >> > >> >Go ahead and take a look and that those tools and let me know if you have >> >any questions or need help you configuring them. >> > >> >Thanks, >> >Carson >> > >> > >> >On 9/19/13 11:53 AM, "Mark Yandell" >> > wrote: >> > >>> >>Hi Corban & Xia, >>> >> >>> >> >>> >>I've forwarded your question along to the MAKER_dev list, were you can >>> >>get speedy answers to your maker related questions. Thanks for using >>> >>MAKER. >>> >> >>> >>--mark >>> >> >>> >> >>> >>Mark Yandell >>> >>Professor of Human Genetics >>> >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>> >>Human Genetics University of Utah >>> >>15 North 2030 East, Room 2100 >>> >>Salt Lake City, UT 84112-5330 >>> >>ph:801-587-7707 >>> >> >>> >>________________________________________ >>> >>From: Xia... at dupont.com [Xia... at dupont.com ] >>> >>Sent: Thursday, September 19, 2013 11:49 AM >>> >>To: Mark Yandell; Corban-Gre... at dupont.com >>> >>Subject: maker2 scripts for functional annotation >>> >> >>> >>Dr. Yandell, >>> >> >>> >>We were recently evaluating maker2 for annotation and going through the >>> >>maker tutorial from 2012. >>> >> >>> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >>> >> >>> >>The tutorial makes references to some scripts that we couldn?t find in >>> >>the current release. We were looking for scripts like >>> >>gff3_preds2models to convert match/match_part format into annotations >>> >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>> >>not have the most up to date version. >>> >> >>> >>In addition to getting accurate gene annotations, I was looking for a >>> >>solution to get functional assignments. I see that there are some >>> >>scripts like maker_functional_fasta that may help, but I was wondering >>> >>what you would recommend. >>> >> >>> >>Thanks, >>> >> >>> >>Corban & Xia >>> >> >>> >>This communication is for use by the intended recipient and contains >>> >>information that may be Privileged, confidential or copyrighted under >>> >>applicable law. If you are not the intended recipient, you are hereby >>> >>formally notified that any use, copying or distribution of this e-mail, >>> >>in whole or in part, is strictly prohibited. Please notify the sender >>> >>by return e-mail and delete this e-mail from your system. Unless >>> >>explicitly and conspicuously designated as "E-Contract Intended", this >>> >>e-mail does not constitute a contract offer, a contract amendment, or >>> >>an acceptance of a contract offer. This e-mail does not constitute a >>> >>consent to the use of sender's contact information for direct marketing >>> >>purposes or for transfers of data to third parties. >>> >> >>> >>The dupont.com web address will continue in use for a >>> transitional >>> >>period for communications sent or received on behalf of DuPont >>> >>Performance Coatings., which is not affiliated in any way with the >>> >>DuPont Company. >>> >> >>> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>> >>Korean >>> >> >>> >> http://www.DuPont.com/corp/email_disclaimer.html >>> >> >>> >>_______________________________________________ >>> >>maker-devel mailing list >>> >>maker... at box290.bluehost.com >>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > >> > >> > >> >This communication is for use by the intended recipient and contains >> >information that may be Privileged, confidential or copyrighted under >> >applicable law. If you are not the intended recipient, you are hereby >> >formally notified that any use, copying or distribution of this e-mail, >> >in whole or in part, is strictly prohibited. Please notify the sender by >> >return e-mail and delete this e-mail from your system. Unless explicitly >> >and conspicuously designated as "E-Contract Intended", this e-mail does >> >not constitute a contract offer, a contract amendment, or an acceptance >> >of a contract offer. This e-mail does not constitute a consent to the >> >use of sender's contact information for direct marketing purposes or for >> >transfers of data to third parties. >> > >> >The dupont.com web address will >> continue in use for a >> >transitional period for communications sent or received on behalf of >> >DuPont >> >Performance Coatings., which is not affiliated in any way with the DuPont >> >Company. >> > >> >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean >> > >> > http://www.DuPont.com/corp/email_disclaimer.html >> > > > _______________________________________________ > maker-devel mailing list > maker... at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Sun Sep 1 02:17:07 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Sun, 1 Sep 2013 08:17:07 +0000 Subject: [maker-devel] error about DBD::SQLite::db Message-ID: Dear all, When I try to run maker on my test dataset, there is an error like this: DBD::SQLite::db do failed: near ",": syntax error at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 1. DBD::SQLite::db do failed: no such column: JUNC00000001 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 2. DBD::SQLite::db do failed: no such column: JUNC00000002 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 3. DBD::SQLite::db do failed: no such column: JUNC00000003 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 4. DBD::SQLite::db do failed: no such column: JUNC00000004 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 5. DBD::SQLite::db do failed: no such column: JUNC00000005 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 6. DBD::SQLite::db do failed: no such column: JUNC00000006 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 7. DBD::SQLite::db do failed: no such column: JUNC00000007 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 8. DBD::SQLite::db do failed: no such column: JUNC00000008 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 9. DBD::SQLite::db do failed: no such column: JUNC00000009 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 10. DBD::SQLite::db do failed: no such column: JUNC00000010 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 11. DBD::SQLite::db do failed: no such column: JUNC00000011 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 12. DBD::SQLite::db do failed: no such column: JUNC00000012 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 13. DBD::SQLite::db do failed: no such column: JUNC00000013 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 14. DBD::SQLite::db do failed: no such column: JUNC00000014 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 15. DBD::SQLite::db do failed: no such column: JUNC00000015 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 16. DBD::SQLite::db do failed: no such column: JUNC00000016 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 17. DBD::SQLite::db do failed: no such column: JUNC00000017 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 18. The JUN*** is the exteral EST I provide. Can anyone give me some suggestions? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Sep 1 05:26:47 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 01 Sep 2013 07:26:47 -0400 Subject: [maker-devel] error about DBD::SQLite::db In-Reply-To: Message-ID: Most likely an issue with your input files format. Try this GFF3 file validator --> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Also make sure you are using the most recent version of MAKER. --Carson From: Jingjing Jin Date: Sunday, September 1, 2013 4:17 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] error about DBD::SQLite::db Dear all, When I try to run maker on my test dataset, there is an error like this: DBD::SQLite::db do failed: near ",": syntax error at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 1. DBD::SQLite::db do failed: no such column: JUNC00000001 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 2. DBD::SQLite::db do failed: no such column: JUNC00000002 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 3. DBD::SQLite::db do failed: no such column: JUNC00000003 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 4. DBD::SQLite::db do failed: no such column: JUNC00000004 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 5. DBD::SQLite::db do failed: no such column: JUNC00000005 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 6. DBD::SQLite::db do failed: no such column: JUNC00000006 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 7. DBD::SQLite::db do failed: no such column: JUNC00000007 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 8. DBD::SQLite::db do failed: no such column: JUNC00000008 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 9. DBD::SQLite::db do failed: no such column: JUNC00000009 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 10. DBD::SQLite::db do failed: no such column: JUNC00000010 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 11. DBD::SQLite::db do failed: no such column: JUNC00000011 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 12. DBD::SQLite::db do failed: no such column: JUNC00000012 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 13. DBD::SQLite::db do failed: no such column: JUNC00000013 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 14. DBD::SQLite::db do failed: no such column: JUNC00000014 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 15. DBD::SQLite::db do failed: no such column: JUNC00000015 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 16. DBD::SQLite::db do failed: no such column: JUNC00000016 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 17. DBD::SQLite::db do failed: no such column: JUNC00000017 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 18. The JUN*** is the exteral EST I provide. Can anyone give me some suggestions? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uqslizbe at uq.edu.au Thu Sep 5 01:30:26 2013 From: uqslizbe at uq.edu.au (Selene Lizbeth Fernandez Valverde) Date: Thu, 5 Sep 2013 17:30:26 +1000 Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks Message-ID: Hi all, I'm currently using Maker to reannotate the genome of the marine sponge. We already have a set of Augustus prediction and gene models that I mapped back to the genome using the patched map2assembly script posted on the mailing list, as well as PASA transcripts (based on Trinity assemblies) and cufflinks transcripts. I would like to include both Trinity and Cufflinks, as in some cases one outperforms the other. I'm currently planning to provide the Trinity/PASA assemblies as fasta to the "est" option and the cufflinks assemblies as gff3 using the "est_gff" option but I'm wondering if MAKER will take into account both types of evidence? Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? Thanks in advance for the advice, Selene **est_gff/est --> These are assumed to be correctly assembled and aligned around splice sites (MAKER uses exonerate to align around splice sites for ESTs in FASTA files). MAKER can use them to infer gene models directly (est2genome option), can use them as support for maintaining predictions, and can use them to modify structure and add UTR to predictions. If you let MAKER try and find alternative splice forms, they will be used to identify support for splice variants. How these cluster with other evidence will help MAKER infer gene boundaries in some cases. MAKER will also use splice sites inferred from the ESTs to inform gene predictors during the prediction step. Selene Fernandez-Valverde Ph.D. Postdoctoral Research Fellow School of Biological Sciences University of Queensland St Lucia QLD 4072 Australia uqslizbe at uq.edu.au -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 5 05:04:43 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Sep 2013 07:04:43 -0400 Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks In-Reply-To: Message-ID: 1. I'm wondering if MAKER will take into account both types of evidence? Yes. 2. Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? You can provide them as a comma separated list of files to the est_gff= option, or you can merge them using the gff3_merge script that comes with MAKER. Unfortunately I have no one best option for which evidence types to include. Every evidence type can contribute in it's own way to the final results. When you test using different evidence types, try running on a single large contig and manually view the results in a browser. Thanks, Carson From: Selene Lizbeth Fernandez Valverde Date: Thursday, September 5, 2013 3:30 AM To: Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks Hi all, I'm currently using Maker to reannotate the genome of the marine sponge. We already have a set of Augustus prediction and gene models that I mapped back to the genome using the patched map2assembly script posted on the mailing list, as well as PASA transcripts (based on Trinity assemblies) and cufflinks transcripts. I would like to include both Trinity and Cufflinks, as in some cases one outperforms the other. I'm currently planning to provide the Trinity/PASA assemblies as fasta to the "est" option and the cufflinks assemblies as gff3 using the "est_gff" option but I'm wondering if MAKER will take into account both types of evidence? Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? Thanks in advance for the advice, Selene **est_gff/est --> These are assumed to be correctly assembled and aligned around splice sites (MAKER uses exonerate to align around splice sites for ESTs in FASTA files). MAKER can use them to infer gene models directly (est2genome option), can use them as support for maintaining predictions, and can use them to modify structure and add UTR to predictions. If you let MAKER try and find alternative splice forms, they will be used to identify support for splice variants. How these cluster with other evidence will help MAKER infer gene boundaries in some cases. MAKER will also use splice sites inferred from the ESTs to inform gene predictors during the prediction step. Selene Fernandez-Valverde Ph.D. Postdoctoral Research Fellow School of Biological Sciences University of Queensland St Lucia QLD 4072 Australia uqslizbe at uq.edu.au _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.zohren at qmul.ac.uk Thu Sep 5 09:58:39 2013 From: j.zohren at qmul.ac.uk (Jasmin Zohren) Date: Thu, 5 Sep 2013 16:58:39 +0100 Subject: [maker-devel] Maker in the cloud Message-ID: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Dear Maker developers, I've already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: 1. To me it seems that there are two different Maker images on EC2 - ami-ea661f83 and ami-b10abed8 - which one is "the right one"? 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn't enough computing power? Or do you have another explanation for this? 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I've also recently heard about a parallelisation module for use in the cloud developed by Era7, called "nispero". But I am not sure whether it is publicly available yet. 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 5 10:26:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Sep 2013 12:26:08 -0400 Subject: [maker-devel] Maker in the cloud In-Reply-To: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Message-ID: Hello Jasmin, I haven't used MAKER in parralel on the cloud before (just tutorial images); however, I believe there is an iPlant atmosphere image available through iPlant with MAKER version 2.27. You can get maximum 16 cpus per instance there. --> http://www.iplantcollaborative.org/discover/atmosphere Alternatively if you have any US based collaborators you can apply for a startup allocation on the Lonestar cluster via XSEDE (allocation can be requested by any US based researcher and only takes a few days to approve) --> https://www.xsede.org/ That cluster was used recently to process the largest genome ever annotated (the pine genome). Total run time will be less than a day on that cluster, because you can request thousands of CPUs for your job with very short queue wait times. There is also a work in progress to give access to MAKER on the same cluster via the iPlant discovery environment. I've CC'd Joshua Stein who can correct me if I'm wrong, but I believe that resource would be available to non-US based researchers as well, and will be available in the very very near future (potentially within the next month or less). Perhaps someone else on the mailing list may want to share their experience using MAKER on the cloud? Thanks, Carson From: Jasmin Zohren Date: Thursday, September 5, 2013 11:58 AM To: Subject: [maker-devel] Maker in the cloud Dear Maker developers, I?ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: 1. To me it seems that there are two different Maker images on EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn?t enough computing power? Or do you have another explanation for this? 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I?ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called ?nispero?. But I am not sure whether it is publicly available yet. 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Thu Sep 5 12:06:05 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 5 Sep 2013 12:06:05 -0600 Subject: [maker-devel] Maker in the cloud In-Reply-To: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> References: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Message-ID: Hi Jasmin, Like Carson, my only significant experience with MAKER in the cloud is using it for our training, however, I'll add make some comments based on experience on the cloud with some of our other tools: There are several cloud architectures available now, but I only have experience with Amazon EC2, so all comments are only relevant there. I wouldn't use any of the existing MAKER AMIs. All of them were created for tutorial purposes, and while they should work fine for a real annotation job, they will be out of date. At the very least if you use one, start with it, but install current MAKER code and save it as a new AMI. You can use MPI on the Amazon nodes, but it's not set up by default to run MPI between nodes. That, can presumably be done but we haven't done it, so there may be headaches involved we just don't know for sure. However, you could split your input fasta into several chunks of roughly equal size and fire up a different EC2 node for each fasta file, then allow maker to use MPI to optimize parallelization on each node individually. MAKER is really good at restarting if things fail, so with that in mind I'd suggest staring spot nodes which can be 10X cheaper than regularly priced nodes. Amazon will kill a spot node as soon as someone comes along who is willing to pay full price, so you'd want a way (either manually checking and restarting nodes or scripting a AWS API solution) to check whether nodes finished and restart them if they did not, but you could save a lot of money by doing this. B On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote: > Dear Maker developers, > > I?ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: > > 1. To me it seems that there are two different Maker images on EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? > 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? > 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn?t enough computing power? Or do you have another explanation for this? > 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I?ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called ?nispero?. But I am not sure whether it is publicly available yet. > 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. > > Many thanks in advance and kind regards, > Jasmin > > ----------------------------- > Jasmin Zohren > PhD student in the INTERCROSSING ITN > Queen Mary University of London > > intercrossing.wikispaces.com > evolve.sbcs.qmul.ac.uk > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Fri Sep 6 12:34:32 2013 From: ejr at stowers.org (Ross, Eric) Date: Fri, 6 Sep 2013 18:34:32 +0000 Subject: [maker-devel] maker-devel Digest, Vol 64, Issue 4 In-Reply-To: Message-ID: It wouldn't be too difficult to run MAKER to run using something like starcluster. Starcluster manages the cluster and nodes for you. http://star.mit.edu/cluster/ It's not too difficult to use. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org On 9/6/13 1:00 PM, "maker-devel-request at yandell-lab.org" wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. Re: Maker in the cloud (Barry Moore) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Thu, 5 Sep 2013 12:06:05 -0600 >From: Barry Moore >To: Jasmin Zohren >Cc: maker-devel at yandell-lab.org >Subject: Re: [maker-devel] Maker in the cloud >Message-ID: >Content-Type: text/plain; charset="windows-1252" > >Hi Jasmin, > >Like Carson, my only significant experience with MAKER in the cloud is >using it for our training, however, I'll add make some comments based on >experience on the cloud with some of our other tools: > >There are several cloud architectures available now, but I only have >experience with Amazon EC2, so all comments are only relevant there. > >I wouldn't use any of the existing MAKER AMIs. All of them were created >for tutorial purposes, and while they should work fine for a real >annotation job, they will be out of date. At the very least if you use >one, start with it, but install current MAKER code and save it as a new >AMI. You can use MPI on the Amazon nodes, but it's not set up by default >to run MPI between nodes. That, can presumably be done but we haven't >done it, so there may be headaches involved we just don't know for sure. >However, you could split your input fasta into several chunks of roughly >equal size and fire up a different EC2 node for each fasta file, then >allow maker to use MPI to optimize parallelization on each node >individually. MAKER is really good at restarting if things fail, so with >that in mind I'd suggest staring spot nodes which can be 10X cheaper than >regularly priced nodes. Amazon will kill a spot node as soon as someone >comes along who is willing to pay full price, so you'd want a way (either >manually checking and restarting nodes or scripting a AWS API solution) >to check whether nodes finished and restart them if they did not, but you >could save a lot of money by doing this. > >B > >On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote: > >> Dear Maker developers, >> >> I?ve already contacted you a while ago about my annotation of the birch >>genome (Betula nana). As I am constantly running into problems using our >>cluster facilities at QMUL I thought of moving into the cloud. As I am >>rather inexperienced in cloud computing I have several questions: >> >> 1. To me it seems that there are two different Maker images on >>EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? >> 2. Can I use this Maker AMI for the annotation of a whole genome >>or is it only suitable for the tutorial tasks? >> 3. Also, when I followed the steps outlined in the tutorial, >>there seemed to be a problem with RepeatMasker. Although Maker would run >>and produce output files, the log file stated that the contig had failed >>after the second attempt. I launched the image on a T1.micro instance, >>maybe that wasn?t enough computing power? Or do you have another >>explanation for this? >> 4. Would it be possible to run the annotation in parallel (e.g. >>using MPICH2) in the cloud? I?ve also recently heard about a >>parallelisation module for use in the cloud developed by Era7, called >>?nispero?. But I am not sure whether it is publicly available yet. >> 5. Do you have any experience of how long an annotation task in >>the cloud would take and also what the expected costs would be? The >>birch genome is only 500 MB in size and currently I am simply annotating >>it with a SNAP trained HMM. However, in the future I will feed it with >>RNAseq data as well. >> >> Many thanks in advance and kind regards, >> Jasmin >> >> >> ----------------------------- >> Jasmin Zohren >> PhD student in the INTERCROSSING ITN >> Queen Mary University of London >> >> intercrossing.wikispaces.com >> evolve.sbcs.qmul.ac.uk >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >Barry Moore >Research Scientist >Dept. of Human Genetics >University of Utah >Salt Lake City, UT 84112 >-------------------------------------------- >(801) 585-3543 > > > > >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: >nts/20130905/bf35206e/attachment-0001.html> > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 64, Issue 4 >****************************************** From bhall7 at hawaii.edu Wed Sep 11 14:23:28 2013 From: bhall7 at hawaii.edu (Brian Hall) Date: Wed, 11 Sep 2013 10:23:28 -1000 Subject: [maker-devel] Question about phase for CDS with start codon Message-ID: <5230D140.7010804@hawaii.edu> Aloha, I'm working with a gff produced by maker. (I didn't run the program myself, but I believe it was version 2.24.) Here are the lines in question: scaffold00033 maker CDS 729494 729949 . - 2 ID=107343;Name=BDOR_005037-RC:cds:250;Parent=107334 scaffold00033 maker start_codon 729947 729949 . - . ID=107349;Name=BDOR_005037-RB:start1;Parent=107334 If I understand correctly, the start codon in this reverse-strand CDS is from position 729949 to 729947 -- the first three bases in the CDS. However, the phase value for the CDS is 2, which essentially skips the start codon. Downstream software (tbl2asn) is kicking up a "missing start codon" error. I have several hundred such issues in the gff for a single genome. They generally only occur on reverse-strand CDSs. Any ideas? Sincerest apologies if this is a duplicate question or if I've provided incomplete information. I am new at this. Thanks for your help! --Brian From ckuanglim at gmail.com Wed Sep 11 23:42:38 2013 From: ckuanglim at gmail.com (Chan Kuang Lim) Date: Thu, 12 Sep 2013 13:42:38 +0800 Subject: [maker-devel] Exon Type in MAKER GFF Output Message-ID: Dear Maker developers, I have a question regarding the GFF output of MAKER. When we look at CDS and Exon, we do not know whether they are initial, internal, terminal or single. How can we capture the exon type from MAKER output? Thanks, Chan KL -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 12 08:21:48 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Sep 2013 10:21:48 -0400 Subject: [maker-devel] Exon Type in MAKER GFF Output In-Reply-To: Message-ID: That information is not explicit in GFF3 format. You have to capture all exons parented onto the mRNA, then sort them to identify if the exon is 5-prime, 3-prime, internal, or single exon. --Carson From: Chan Kuang Lim Date: Thursday, September 12, 2013 1:42 AM To: Subject: [maker-devel] Exon Type in MAKER GFF Output Dear Maker developers, I have a question regarding the GFF output of MAKER. When we look at CDS and Exon, we do not know whether they are initial, internal, terminal or single. How can we capture the exon type from MAKER output? Thanks, Chan KL _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 12 09:27:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Sep 2013 11:27:44 -0400 Subject: [maker-devel] Question about phase for CDS with start codon In-Reply-To: <5230D140.7010804@hawaii.edu> Message-ID: I know there was an incorrect phase issue on a previous maker version that is now fixed, but I really doubt that is the issue causing your error. What are you using to convert from GFF3 to tbl format before using tbl2asn? I'd start there. we can send you a GFF3 to tbl converter if that will help. --Carson On 9/11/13 4:23 PM, "Brian Hall" wrote: >Aloha, > >I'm working with a gff produced by maker. (I didn't run the program >myself, but I believe it was version 2.24.) Here are the lines in >question: > >scaffold00033 maker CDS 729494 729949 . - 2 >ID=107343;Name=BDOR_005037-RC:cds:250;Parent=107334 >scaffold00033 maker start_codon 729947 729949 . - . >ID=107349;Name=BDOR_005037-RB:start1;Parent=107334 > >If I understand correctly, the start codon in this reverse-strand CDS is >from position 729949 to 729947 -- the first three bases in the CDS. >However, the phase value for the CDS is 2, which essentially skips the >start codon. Downstream software (tbl2asn) is kicking up a "missing >start codon" error. > >I have several hundred such issues in the gff for a single genome. They >generally only occur on reverse-strand CDSs. Any ideas? > >Sincerest apologies if this is a duplicate question or if I've provided >incomplete information. I am new at this. Thanks for your help! > >--Brian > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Fri Sep 13 02:15:29 2013 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Fri, 13 Sep 2013 10:15:29 +0200 Subject: [maker-devel] Maker pass-through behavior Message-ID: <5232C9A1.3060709@imbim.uu.se> Dear list, I have started using Maker to explore its use for a number of genome projects we are planning on running. One of the tools we intend on incorporating into our pipeline is PASA (Since we will be using Trinity etc). The (cleaned) output with predicted gene structures I would like to pass to Maker as pass-through annotation (I am optimistic that way...) - but I noticed that doing so does not always result in the incorporation of the PASA gene model into the final maker annotation track. Sometimes it seems to be superseded by an Augustus/Maker model, sometimes the region stays empty (even tho a protein alignment is present). So my question is how Maker handles pass-throughs, exactly. Can it reject pass-throughs, or should it always use such models over any other data source? Is there any scenario were it wouldn't? I understand that Maker uses some internal scoring system to estimate the accuracy of an annotation - could that be a reason? It would be a bit odd tho, since a lift-over from chicken (to our bird genome) seems to support gene models produced by PASA, yet they are nowhere to be found in the final models. And a related question: Is there a comprehensive documentation where I can get more information on the internal decision making process of Maker? Or do I have to dig into the code for that? Cheers, Marc PS I have attached a screenshot of such an example - the green track is Maker with proteins + augustus (chicken models) + PASA pass-through of a cleaned-up gene structure file. (Orange: Cleaned ORFs directly from PASA output, Grey: PASA ORFs without cleaning, Dark red: Maker with proteins and trinity transcripts as EST evidence, Black: chicken lift-overs from EnsEMBL) -------------- next part -------------- A non-text attachment was scrubbed... Name: igv_snapshot.png Type: image/png Size: 50142 bytes Desc: not available URL: From carsonhh at gmail.com Sun Sep 15 12:39:29 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 15 Sep 2013 14:39:29 -0400 Subject: [maker-devel] Maker pass-through behavior In-Reply-To: <5232C9A1.3060709@imbim.uu.se> Message-ID: > So my question is how Maker handles pass-throughs, exactly. Can it > reject pass-throughs, or should it always use such models over any other > data source? Is there any scenario were it wouldn't? pred_gff is treated the same as any other ab initio prediction. It is just one among several candidate gene models. The model that is kept is the one with the lowest AED score (lower means better evidence match/support). Any model with no evidence support or AED=1 will be rejected (no evidence support) unless keep_preds=1 is set. There is also another score eAED which takes into account protein reading frame (protein evidence must be in same reading frame as the gene model). If eAED =1 it will also cause models to be rejected. > I understand that Maker uses some internal scoring system to estimate > the accuracy of an annotation - could that be a reason? Possibly. Look at the AED score of the pass-through model in the final MAKER GFF3 to see what the AED score was. If you want to send me GFF3 to look at with a list of regions you are concerned about I can tell you more. Also consider giving PASA results to est_gff as well to bias the scoring algorithm to maintain those models (I.e. Model supports itself, which is reasonable since these are EST derived anyways and not just ab initio predictions). Also the model_gff option will always keep an input model (with or without evidence support) and will only replace it with something else if that something else has a better AED score. > > > > And a related question: Is there a comprehensive documentation where I > can get more information on the internal decision making process of > Maker? Or do I have to dig into the code for that? Look at these two papers --> Holt, C., and Yandell, M. (2011). MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491. Eilbeck, K., Moore, B., Holt, C., and Yandell, M. (2009). Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67. Thanks, Carson On 9/13/13 4:15 AM, "Marc P. Hoeppner" wrote: > Dear list, > > I have started using Maker to explore its use for a number of genome > projects we are planning on running. One of the tools we intend on > incorporating into our pipeline is PASA (Since we will be using Trinity > etc). The (cleaned) output with predicted gene structures I would like > to pass to Maker as pass-through annotation (I am optimistic that > way...) - but I noticed that doing so does not always result in the > incorporation of the PASA gene model into the final maker annotation > track. Sometimes it seems to be superseded by an Augustus/Maker model, > sometimes the region stays empty (even tho a protein alignment is present). > > So my question is how Maker handles pass-throughs, exactly. Can it > reject pass-throughs, or should it always use such models over any other > data source? Is there any scenario were it wouldn't? > > I understand that Maker uses some internal scoring system to estimate > the accuracy of an annotation - could that be a reason? It would be a > bit odd tho, since a lift-over from chicken (to our bird genome) seems > to support gene models produced by PASA, yet they are nowhere to be > found in the final models. > > And a related question: Is there a comprehensive documentation where I > can get more information on the internal decision making process of > Maker? Or do I have to dig into the code for that? > > Cheers, > > Marc > > PS I have attached a screenshot of such an example - the green track is > Maker with proteins + augustus (chicken models) + PASA pass-through of a > cleaned-up gene structure file. (Orange: Cleaned ORFs directly from PASA > output, Grey: PASA ORFs without cleaning, Dark red: Maker with proteins > and trinity transcripts as EST evidence, Black: chicken lift-overs from > EnsEMBL) > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhinsley at ebi.ac.uk Mon Sep 16 03:51:35 2013 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Mon, 16 Sep 2013 10:51:35 +0100 Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes In-Reply-To: <5232C9A1.3060709@imbim.uu.se> References: <5232C9A1.3060709@imbim.uu.se> Message-ID: <5236D4A7.6080303@ebi.ac.uk> Hello I'm trying to get maker to run on MPI using several nodes. I have an installation set up by a colleague which includes maker 2.27 and openmpi-1.4.3. Previously it has only been used (here at EBI) with maker processes running on one node only, but i find that it can wait a very long time before being scheduled by LSF. The command used to submit is like this (as per recommendations from systems) (uses 8 cpus on each of 8 nodes) |export OMP_NUM_THREADS=||64| |bsub -q mpi -M ||40000| |-R ||"rusage[mem=40000] && span[ptile=8]"| |-n ||64| |-o lsf_log -a openmpi mpirun.lsf -np ||64| |-mca btl tcp,self maker ||2||>&||1| and requires environment be set up in ~/.bashrc for openMPI. This runs but produces a lot of errors like: DBD::SQLite::db do failed: database is locked at /nfs/production/panda/ensemblgenomes/external/maker/2.27_mpi/maker/bin/../lib/GFFDB.pm line 407. I've looked at https://groups.google.com/forum/#!searchin/maker-devel/database$20locked/maker-devel/TscBgbQfBX4/pae016DqlIMJ which suggests that "It means that your GFF3 results will not be integrated" (but i'm not sure what's meant by that, but the number of genes i'm getting is around 2k, expect more like 15k) and that the problem is SQLite using NFS (a known issue), and the fix is to use /tmp. I have TMP= set as per default in maker_opts.ctl, and there are maker directories in /tmp on the runtime nodes, but the database (i guess) is in /nfs/...../maker//.scf.db. I don't see how i could set the working directory to a non-NFS file systems and still use more than one node, but this error only seems to appear (so far) with est2genome, not when running SNAP/ Augustus. Is there a work around to stop getting the locked error or some way to recover from it after maker has finished? Or is it necessary to run the est2genome step (or maker generally) on one node? An obvious option is to split the assembly but i was hoping to avoid that. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 17 21:35:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Sep 2013 21:35:52 -0600 Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes In-Reply-To: <5236D4A7.6080303@ebi.ac.uk> Message-ID: Sorry for the slow reply, I'm currently traveling. Try deleting any *.db diles in the maker output directory to force the SQLite database to be rebuilt. Also you can try the current version of MAKER at yandell-lab.org. MAKER is supposed to try and copy the database to the /tmp directory before it starts work. That way the actual working copy will be local, and will be independent for each node. --Carson From: Malcolm Hinsley Date: Monday, September 16, 2013 3:51 AM To: Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes Hello I'm trying to get maker to run on MPI using several nodes. I have an installation set up by a colleague which includes maker 2.27 and openmpi-1.4.3. Previously it has only been used (here at EBI) with maker processes running on one node only, but i find that it can wait a very long time before being scheduled by LSF. The command used to submit is like this (as per recommendations from systems) (uses 8 cpus on each of 8 nodes) export OMP_NUM_THREADS=64 bsub -q mpi -M 40000 -R "rusage[mem=40000] && span[ptile=8]" -n 64 -o lsf_log -a openmpi mpirun.lsf -np 64 -mca btl tcp,self maker 2>&1 and requires environment be set up in ~/.bashrc for openMPI. This runs but produces a lot of errors like: DBD::SQLite::db do failed: database is locked at /nfs/production/panda/ensemblgenomes/external/maker/2.27_mpi/maker/bin/../li b/GFFDB.pm line 407. I've looked at https://groups.google.com/forum/#!searchin/maker-devel/database$20locked/mak er-devel/TscBgbQfBX4/pae016DqlIMJ which suggests that "It means that your GFF3 results will not be integrated" (but i'm not sure what's meant by that, but the number of genes i'm getting is around 2k, expect more like 15k) and that the problem is SQLite using NFS (a known issue), and the fix is to use /tmp. I have TMP= set as per default in maker_opts.ctl, and there are maker directories in /tmp on the runtime nodes, but the database (i guess) is in /nfs/...../maker//.scf.db. I don't see how i could set the working directory to a non-NFS file systems and still use more than one node, but this error only seems to appear (so far) with est2genome, not when running SNAP/ Augustus. Is there a work around to stop getting the locked error or some way to recover from it after maker has finished? Or is it necessary to run the est2genome step (or maker generally) on one node? An obvious option is to split the assembly but i was hoping to avoid that. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 17 21:57:12 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Sep 2013 21:57:12 -0600 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: It does sound like this is likely the result of gene fusion from the trinity assemblies. One thing to look at is the number of coding exons compared to the other ant species. See if the increase in exons is mostly in UTR, coding sequence, or both. One thing you could try is running MAKER without the EST evidence, just to see how many genes you get with protein only support. There are ways to use multiple MAKER runs to tease out details of the data. For example: run1: protein evidence only plus ab initio predators like snap and augustus. run2: protein and EST evidence. Models from run1 passed in as pred_gff with snap and augustus turned off (this will force the addition of UTR, but not the generation of new models). Use the correct_est_fusion=1 option here to clip UTR that runs into neighboring genes. run3: protein and EST evidence plus augusuts and snap. Then take models fromrun2 and models from run3 that do not overlap run2 and add them all to your final set along with any models that come from interproscan domain analysis of rejected models. This solution is rather lengthy, but may avoid many of the problems you seem to be getting with gene merging even with jaccard_clip and correct_est_fusion turned on, because your ESTs would only contribute to the UTR and to models not found based solely on protein evidence (I.e. They would be ignored in cases where you get enough evidence from other sources). --Carson From: Benjamin Rubin Date: Tuesday, September 17, 2013 10:08 AM To: Carson Holt Subject: Re: [maker-devel] Unexpected results with correct_est_fusion Hi Carson, The new version is working great. Thanks for your help. I do have another more general question. I am working on annotating a new ant genome (Pseudomyrmex gracilis) and the results that I am getting from MAKER are a bit unexpected. The number of genes produced by MAKER is ~14,300 while, as you may know, the seven published ant genomes have at least 16,000 genes (this number was improved by several hundred by turning on correct_est_fusion). Running the ab initio predictions through InterProScan yields ~900 additional genes for P. gracilis so there are still substantially fewer genes found for this species. This difference on its own is not that unexpected; Pseudomyrmex likely diverged from the other sequenced ants by over 100 million years and the genome sequence itself is rather fragmented and incomplete. However, what is bothering me is that, despite having fewer genes, I am seeing substantially larger numbers of exons (~92,000 as opposed to 78-85,000) and the total length of all proteins is more than a million amino acids longer in P. gracilis. It does not have unexpectedly long genes, the average gene length is just a bit higher. I have looked at the annotations of some conserved genes and found some apparently spurious exons merged with these genes. I say that they are spurious because they go beyond the end of the gene sequence in other species (ants and Drosophila). Unfortunately, it appears that many of these spurious calls are primarily the result of blast hits to my EST data. The ESTs generally seem to blast to the genome a bit more often than expected. Partly as a result of the relatively high repeat content of my genome (~50% complex repeats) and partly because we only used two Illumina libraries, my genome sequence is quite fragmented (~280Mb in ~6,500 scaffolds). Note that the total genome length is estimated at 387Mb, so I am missing a fair amount but almost all CEGMA genes are present in the assembly so I have concluded that the missing sequence is predominantly repeats. I have no prior reason to expect that my EST library has anything wrong with it. I did a single Illumina lane of RNA-seq and assembled in Trinity with the jaccard_clip option on to reduce gene fusions. If you have any advice on how my gene predictions can be improved, I would really appreciate it. Have you heard of this kind of problem before? Is there a way to limit the influence of ESTs without discarding them entirely? Thanks so much for your help with the fusion bug and for any advice here. Ben On Wed, Sep 11, 2013 at 9:27 AM, Benjamin Rubin wrote: > Hi Carson, > > OK, I will try it and let you know how it goes. And thanks for the suggestion > about using always_complete as well. > > Thanks! > Ben > > > On Tue, Sep 10, 2013 at 9:45 PM, Carson Holt wrote: >> I think I have it fixed. Sorry it took so long, but my original fix actually >> created other odd behaviors so I had to track those down as well. >> >> You can download the test version with the fix by typing this on the command >> line --> >> >> svn co ********* >> >> user: ***** >> password: ***** >> >> Test it out and let me know. On the contig you sent me, I also set >> always_complete=1 as some of the hint based models were lacking start or stop >> codons. The results looked slightly better that way as well. >> >> Thanks, >> Carson >> >> >> >> From: Benjamin Rubin >> Date: Wednesday, September 4, 2013 10:07 AM >> To: Carson Holt >> >> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >> >> OK, great. Thanks for letting me know. >> >> Ben >> >> >> On Wed, Sep 4, 2013 at 9:00 AM, Carson Holt wrote: >>> I thought I'd give you an update on this. I've verified the bug and think >>> I've identified roughly where it's happening. I'll have a fix for you to >>> test soon. >>> >>> --Carson >>> >>> >>> From: Benjamin Rubin >>> >>> Date: Wednesday, August 28, 2013 4:16 PM >>> To: Carson Holt >>> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >>> >>> Hi Carson, >>> >>> OK, I think I uploaded all of the necessary files. I made a directory named >>> "rubin_data" for everything. I included both the full genome file >>> ("ec_patch...") as well as a file for scaffold_1. For this scaffold, I get >>> 132 genes when correct_est_fusion is off and 35 when it is on. These results >>> are after running maker a first time with correct_est_fusion on and >>> retraining SNAP/Augustus on the results. The SNAP file is >>> "gracilis_round_1.hmm" and I think the necessary Augustus files are in the >>> "gracilis_jaccard_flank100_corrfusion_round_1_results" directory. I also >>> included gff files for scaffold_1 with and without correct_est_fusion turned >>> on. >>> >>> Let me know if there is anything else that I failed to upload. I really >>> appreciate your time. Thanks so much. >>> >>> Ben >>> >>> >>> On Wed, Aug 28, 2013 at 9:59 AM, Benjamin Rubin >>> wrote: >>>> Hi Carson, >>>> >>>> Yes, I would be happy to upload the necessary data. Just let me know the >>>> connection information. >>>> >>>> Thanks! >>>> Ben >>>> >>>> >>>> On Wed, Aug 28, 2013 at 8:09 AM, Carson Holt wrote: >>>>> Could you pick one contig where the number of genes shift dramatically and >>>>> upload that contig fasta together with your control files and any evidence >>>>> datasets used to one of our servers (I'm going to send you connection >>>>> details in a separate e-mail). I can then run with and without >>>>> correct_est_fusion to see if there is anything unexpected going on. >>>>> >>>>> --Carson >>>>> >>>>> >>>>> >>>>> From: Benjamin Rubin >>>>> Date: Tuesday, August 27, 2013 10:59 AM >>>>> To: Carson Holt >>>>> Cc: >>>>> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >>>>> >>>>> Hi Carson, >>>>> >>>>> I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but >>>>> I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank >>>>> at 100). This is using only the first round with SNAP and Augustus trained >>>>> on the CEGMA genes. Is there anything else that I might be doing wrong? I >>>>> have attached my control file in case that could be useful. >>>>> >>>>> Thanks for the help! >>>>> Ben >>>>> >>>>> >>>>> On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: >>>>>> The correct_est_fusion option just clips UTR on overlapping genes. I >>>>>> suspect the real problem is setting pred_flank too low. If your lead in >>>>>> sequence to a gene is too short, ab initio predictors won't call it. So >>>>>> you are probably getting empty reports from SNAP/Augustus for the hint >>>>>> based predictions. Try increasing pred_flank to at least 150. Setting >>>>>> pred_flank too low will also limit how far MAKER will walk out along the >>>>>> edges initial alignments during the polishing step (exonerate). So >>>>>> setting it too low may also be causing you to lose some EST and protein >>>>>> alignments. >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> From: Benjamin Rubin >>>>>> Date: Monday, August 26, 2013 2:20 PM >>>>>> To: >>>>>> Subject: [maker-devel] Unexpected results with correct_est_fusion >>>>>> >>>>>> Hello developers, >>>>>> >>>>>> I am using MAKER 2.28 to annotate an ant genome. I provide protein >>>>>> sequence evidence from all seven of the other sequenced ant genomes and a >>>>>> de novo assembled transcriptome as EST evidence. I assembled the >>>>>> transcriptome using Trinity with the jaccard_clip option turned on to >>>>>> reduce gene fusions. Despite using this set of hopefully non-fused ESTs, >>>>>> I still have substantial fusion problems with the final annotation. >>>>>> Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. >>>>>> However, correct_est_fusion leads to the prediction of a much smaller >>>>>> number of genes (~5,000 instead of ~14,000). I am initially training both >>>>>> SNAP and Augustus using CEGMA genes and then retraining based on the >>>>>> first round of annotation. Both rounds of annotation yield the same low >>>>>> number (~5,000) of genes. It may also be worth mentioning that the number >>>>>> of exons is also far lower when using correct_est_fusion (~26,000 instead >>>>>> of ~90,000). >>>>>> >>>>>> Is this the expected behavior of correct_est_fusion? I was surprised that >>>>>> it reduced the predicted number of genes by such a large margin. I am >>>>>> concerned that I am using it incorrectly. Do you have any other >>>>>> suggestions for reducing gene merging? >>>>>> >>>>>> Thanks, >>>>>> Ben >>>>>> >>>>>> -- >>>>>> _____________________________________________________ >>>>>> Benjamin ER Rubin >>>>>> PhD Candidate >>>>>> Committee on Evolutionary Biology >>>>>> University of Chicago >>>>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>>>> >>>>>> Division of Insects >>>>>> Zoology Department >>>>>> Field Museum of Natural History >>>>>> 1400 South Lake Shore Drive >>>>>> Chicago, IL 60605 >>>>>> USA >>>>>> Office: (312) 665-7776 >>>>>> _______________________________________________ maker-devel mailing list >>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinf >>>>>> o/maker-devel_yandell-lab.org >>>>> >>>>> >>>>> >>>>> -- >>>>> _____________________________________________________ >>>>> Benjamin ER Rubin >>>>> PhD Candidate >>>>> Committee on Evolutionary Biology >>>>> University of Chicago >>>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>>> >>>>> Division of Insects >>>>> Zoology Department >>>>> Field Museum of Natural History >>>>> 1400 South Lake Shore Drive >>>>> Chicago, IL 60605 >>>>> USA >>>>> Office: (312) 665-7776 >>>> >>>> >>>> >>>> -- >>>> _____________________________________________________ >>>> Benjamin ER Rubin >>>> PhD Candidate >>>> Committee on Evolutionary Biology >>>> University of Chicago >>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>> >>>> Division of Insects >>>> Zoology Department >>>> Field Museum of Natural History >>>> 1400 South Lake Shore Drive >>>> Chicago, IL 60605 >>>> USA >>>> Office: (312) 665-7776 >>> >>> >>> >>> -- >>> _____________________________________________________ >>> Benjamin ER Rubin >>> PhD Candidate >>> Committee on Evolutionary Biology >>> University of Chicago >>> http://www.moreaulab.org/Benjamin_Rubin.html >>> >>> Division of Insects >>> Zoology Department >>> Field Museum of Natural History >>> 1400 South Lake Shore Drive >>> Chicago, IL 60605 >>> USA >>> Office: (312) 665-7776 >> >> >> >> -- >> _____________________________________________________ >> Benjamin ER Rubin >> PhD Candidate >> Committee on Evolutionary Biology >> University of Chicago >> benrubin.org >> >> Division of Insects >> Zoology Department >> Field Museum of Natural History >> 1400 South Lake Shore Drive >> Chicago, IL 60605 >> USA >> Office: (312) 665-7776 > > > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > benrubin.org > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From leshin at gmail.com Wed Sep 18 13:35:10 2013 From: leshin at gmail.com (Le-Shin Wu) Date: Wed, 18 Sep 2013 15:35:10 -0400 Subject: [maker-devel] running mpi MAKER Message-ID: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Hi, I am new to MAKER and just started to use MAKER for doing some genome annotations. I compiled MAKER package with mpi-supported configuration on our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER job, I got whole bunch of warring message as shown below in my error log file. I wonder is there anything wrong with this warring message? Thank you. (I request 64 processors on two nodes) STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. Best LW From carsonhh at gmail.com Wed Sep 18 14:27:32 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Sep 2013 14:27:32 -0600 Subject: [maker-devel] running mpi MAKER In-Reply-To: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Message-ID: It means either maker as not properly configured for MPI support, or the communication ring is not launching properly. Three things: 1. In the .../maker/src/ directory, run './Build status'. Does it say MPI_SUPPORT is configured or installed? 2. Run 'which mpiexec' on the command line? What is the path? Is is MPICH2 mpiexec, or OpenMPI, or something else? 3. Run 'mpiexec -n 64 -hostfile $PBS_NODEFILE hostname' on the command line. What does it print out? Thanks, Carson On 9/18/13 1:35 PM, "Le-Shin Wu" wrote: >Hi, > >I am new to MAKER and just started to use MAKER for doing some genome >annotations. I compiled MAKER package with mpi-supported configuration on >our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker >maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER >job, I got whole bunch of warring message as shown below in my error log >file. I wonder is there anything wrong with this warring message? Thank >you. (I request 64 processors on two nodes) > >STATUS: Processing and indexing input FASTA files... >WARNING: Multiple MAKER processes have been started in the >same directory. > > >Best > >LW >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From lewu at indiana.edu Wed Sep 18 19:30:49 2013 From: lewu at indiana.edu (Le-shin Wu) Date: Wed, 18 Sep 2013 21:30:49 -0400 Subject: [maker-devel] running mpi MAKER In-Reply-To: References: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Message-ID: Hi Carson, Thanks a lot for your information. When I run './Build status', it shows as below and looks like MPI SUPPORT is enabled. ============================================================================== STATUS MAKER 2.27 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: ENABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK But when I run 'which mpiexec' it shows "/N/soft/mason/openmpi/1.5.4/gcc/bin/mpiexec". So I think I did not use the correct version of mpiexec while running my MAKER job. Thanks again. I will try my MAKER job again with the correct mpiexec from mpich2. Best LW ____________________________________________ Le-Shin Wu Center for Computational Cytomics, Indiana University http://www.cs.indiana.edu/~lewu ____________________________________________ On Wed, Sep 18, 2013 at 4:27 PM, Carson Holt wrote: > It means either maker as not properly configured for MPI support, or the > communication ring is not launching properly. > > Three things: > 1. In the .../maker/src/ directory, run './Build status'. Does it say > MPI_SUPPORT is configured or installed? > 2. Run 'which mpiexec' on the command line? What is the path? Is is > MPICH2 mpiexec, or OpenMPI, or something else? > 3. Run 'mpiexec -n 64 -hostfile $PBS_NODEFILE hostname' on the command > line. What does it print out? > > Thanks, > Carson > > > On 9/18/13 1:35 PM, "Le-Shin Wu" wrote: > > >Hi, > > > >I am new to MAKER and just started to use MAKER for doing some genome > >annotations. I compiled MAKER package with mpi-supported configuration on > >our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker > >maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER > >job, I got whole bunch of warring message as shown below in my error log > >file. I wonder is there anything wrong with this warring message? Thank > >you. (I request 64 processors on two nodes) > > > >STATUS: Processing and indexing input FASTA files... > >WARNING: Multiple MAKER processes have been started in the > >same directory. > > > > > >Best > > > >LW > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhinsley at ebi.ac.uk Thu Sep 19 09:37:17 2013 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Thu, 19 Sep 2013 16:37:17 +0100 Subject: [maker-devel] 2.27 and 2.28 incompatible Message-ID: <523B1A2D.7020300@ebi.ac.uk> To try to fix SQL lock file errors I installed 2.28 and made the mistake of running on a directory made by 2.27 (to run snap and augustus for the first time). Every contig fails due to errors like: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't open file 04723bc0c22478764d90bbaebca96d23 STACK: Error::throw STACK: Bio::Root::Root::throw /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/Root/Root.pm:472 STACK: Bio::DB::Fasta::fh /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:948 STACK: Bio::DB::Fasta::subseq /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:929 STACK: Bio::PrimarySeq::Fasta::seq /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:1089 STACK: FastaSeq::seq /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/FastaSeq.pm:50 STACK: Process::MpiChunk::_go /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:478 STACK: Process::MpiChunk::run /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiTiers.pm:286 STACK: /nfs/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/maker:667 ----------------------------------------------------------- --> rank=NA, hostname=ebi3-198.ebi.ac.uk at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Error.pm line 38 Error::_throw_Error_Simple('HASH(0x388cb78)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../ lib/Error.pm line 306 Error::subs::run_clauses('HASH(0x388cbf0)', '\x{a}------------- EXCEPTION: Bio::Root::Exception -------------\x{a}...', undef, 'ARRAY(0x38a0d18)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x38f93f8)', 'HASH(0x388cbf0)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich 2/bin/../lib/FastaSeq.pm line 95 FastaSeq::seq('FastaSeq=HASH(0x388dda0)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/ Process/MpiChunk.pm line 478 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x38a0e50)', 'run', 'HASH(0x38a0ec8)', 0, 0) called at /nfs/production/panda/ens emblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run('Process::MpiChunk=HASH(0x38a0e50)', 0) called at /nfs/production/panda/ensemblgenomes/external/maker/2. 28_mpich2/bin/../lib/Process/MpiChunk.pm line 357 Process::MpiChunk::run_all('Process::MpiChunk=HASH(0x38a0e50)', 0) called at /nfs/production/panda/ensemblgenomes/external/make r/2.28_mpich2/bin/../lib/Process/MpiTiers.pm line 286 Process::MpiTiers::run_all('Process::MpiTiers=HASH(0x3867960)', 0) called at /nfs/panda/ensemblgenomes/external/maker/2.28_mpic h2/bin/maker line 667 Is there an easy to reset the datastore/ file names so that i can switch over to 2.28 without starting over? (eg maker -dsindex) I killed the job and ran 2.27 instead which seems to be jim dandy. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom From carsonhh at gmail.com Thu Sep 19 10:06:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Sep 2013 10:06:09 -0600 Subject: [maker-devel] 2.27 and 2.28 incompatible In-Reply-To: <523B1A2D.7020300@ebi.ac.uk> Message-ID: There is something very odd, because I've never seen those errors before, and 2.28 should use the same datastore structure as 2.27. I'm going to write a script that will print out certain configuration information about your install that might help me see what's going on. My plane is boarding now, so I'll send it to you later this evening. Thanks, Carson On 9/19/13 9:37 AM, "Malcolm Hinsley" wrote: >To try to fix SQL lock file errors I installed 2.28 and made the mistake >of running on a directory made by 2.27 (to run snap and augustus for the >first time). > >Every contig fails due to errors like: > > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Can't open file 04723bc0c22478764d90bbaebca96d23 >STACK: Error::throw >STACK: Bio::Root::Root::throw >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/Root/Root. >pm:472 >STACK: Bio::DB::Fasta::fh >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:948 >STACK: Bio::DB::Fasta::subseq >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:929 >STACK: Bio::PrimarySeq::Fasta::seq >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:1089 >STACK: FastaSeq::seq >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/FastaSeq.pm:50 >STACK: Process::MpiChunk::_go >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:478 >STACK: Process::MpiChunk::run >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:341 >STACK: Process::MpiChunk::run_all >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:357 >STACK: Process::MpiTiers::run_all >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiTiers.pm:286 >STACK: /nfs/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/maker:667 >----------------------------------------------------------- >--> rank=NA, hostname=ebi3-198.ebi.ac.uk > at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Error.pm >line 38 > Error::_throw_Error_Simple('HASH(0x388cb78)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../ >lib/Error.pm line 306 > Error::subs::run_clauses('HASH(0x388cbf0)', '\x{a}------------- >EXCEPTION: Bio::Root::Exception -------------\x{a}...', undef, >'ARRAY(0x38a0d18)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Error.pm >line 426 > Error::subs::try('CODE(0x38f93f8)', 'HASH(0x388cbf0)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich >2/bin/../lib/FastaSeq.pm line 95 > FastaSeq::seq('FastaSeq=HASH(0x388dda0)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/ >Process/MpiChunk.pm line 478 > Process::MpiChunk::_go('Process::MpiChunk=HASH(0x38a0e50)', >'run', 'HASH(0x38a0ec8)', 0, 0) called at /nfs/production/panda/ens >emblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm >line 341 > Process::MpiChunk::run('Process::MpiChunk=HASH(0x38a0e50)', 0) >called at /nfs/production/panda/ensemblgenomes/external/maker/2. >28_mpich2/bin/../lib/Process/MpiChunk.pm line 357 > Process::MpiChunk::run_all('Process::MpiChunk=HASH(0x38a0e50)', >0) called at /nfs/production/panda/ensemblgenomes/external/make >r/2.28_mpich2/bin/../lib/Process/MpiTiers.pm line 286 > Process::MpiTiers::run_all('Process::MpiTiers=HASH(0x3867960)', >0) called at /nfs/panda/ensemblgenomes/external/maker/2.28_mpic >h2/bin/maker line 667 > >Is there an easy to reset the datastore/ file names so that i can switch >over to 2.28 without starting over? (eg maker -dsindex) >I killed the job and ran 2.27 instead which seems to be jim dandy. > >-- >malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 >European Bioinformatics Institute (EMBL-EBI) >European Molecular Biology Laboratory >Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD >United Kingdom > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From myandell at genetics.utah.edu Thu Sep 19 11:53:48 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Thu, 19 Sep 2013 17:53:48 +0000 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: References: Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Hi Corban & Xia, I've forwarded your question along to the MAKER_dev list, were you can get speedy answers to your maker related questions. Thanks for using MAKER. --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] Sent: Thursday, September 19, 2013 11:49 AM To: Mark Yandell; Corban-Gregory.Rivera at dupont.com Subject: maker2 scripts for functional annotation Dr. Yandell, We were recently evaluating maker2 for annotation and going through the maker tutorial from 2012. http://gmod.org/wiki/MAKER_Tutorial_2012 The tutorial makes references to some scripts that we couldn?t find in the current release. We were looking for scripts like gff3_preds2models to convert match/match_part format into annotations with gene/mRNA/exons/CDS and others. I was wondering if maybe we did not have the most up to date version. In addition to getting accurate gene annotations, I was looking for a solution to get functional assignments. I see that there are some scripts like maker_functional_fasta that may help, but I was wondering what you would recommend. Thanks, Corban & Xia This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. The dupont.com web address will continue in use for a transitional period for communications sent or received on behalf of DuPont Performance Coatings., which is not affiliated in any way with the DuPont Company. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From carsonhh at gmail.com Thu Sep 19 15:58:16 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Sep 2013 15:58:16 -0600 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Message-ID: Hello Corban & Xia, Some scripts like gff3_preds2models are deprecated. To get the same result as was offered by gff3_preds2models, just give your match/match_part features to pref_gff= in the maker_opts.ctl file, set keep_preds=1, and run with all other options and predictors turned off. The final MAKER result will be your match/match_part features converted into gene/mRNA/exons/CDS. For functional annotation, you can use Interproscan, BLASTP against UniProt, or BALST2GO. My preference is to use InterProScan to add GO terms and proteins domains via the ipr_update_gff and iprscan2gff3 scripts. Then add putative gene functions via BLASTP to UniProt and maker_functional_fasta and maker_functional_gff scripts. Go ahead and take a look and that those tools and let me know if you have any questions or need help you configuring them. Thanks, Carson On 9/19/13 11:53 AM, "Mark Yandell" wrote: >Hi Corban & Xia, > > >I've forwarded your question along to the MAKER_dev list, were you can >get speedy answers to your maker related questions. Thanks for using >MAKER. > >--mark > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >Sent: Thursday, September 19, 2013 11:49 AM >To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >Subject: maker2 scripts for functional annotation > >Dr. Yandell, > >We were recently evaluating maker2 for annotation and going through the >maker tutorial from 2012. > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >The tutorial makes references to some scripts that we couldn?t find in >the current release. We were looking for scripts like gff3_preds2models >to convert match/match_part format into annotations with >gene/mRNA/exons/CDS and others. I was wondering if maybe we did not have >the most up to date version. > >In addition to getting accurate gene annotations, I was looking for a >solution to get functional assignments. I see that there are some >scripts like maker_functional_fasta that may help, but I was wondering >what you would recommend. > >Thanks, > >Corban & Xia > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender by >return e-mail and delete this e-mail from your system. Unless explicitly >and conspicuously designated as "E-Contract Intended", this e-mail does >not constitute a contract offer, a contract amendment, or an acceptance >of a contract offer. This e-mail does not constitute a consent to the >use of sender's contact information for direct marketing purposes or for >transfers of data to third parties. > >The dupont.com web address will continue in use for a >transitional period for communications sent or received on behalf of >DuPont >Performance Coatings., which is not affiliated in any way with the DuPont >Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean > > http://www.DuPont.com/corp/email_disclaimer.html > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From graham.etherington at sainsbury-laboratory.ac.uk Wed Sep 25 05:49:40 2013 From: graham.etherington at sainsbury-laboratory.ac.uk (graham etherington (TSL)) Date: Wed, 25 Sep 2013 11:49:40 +0000 Subject: [maker-devel] Path and contents of RepBase Message-ID: Hi, I'm getting the following error when I run maker v2.28: WARNING: RepBase is not installed for RepeatMasker. This limits RepeatMasker's functionality and makes the model_org option in the control files virtually meaningless. MAKER will now reconfigure for simple repeat masking only. In maker_opts.clt I have: model_org=all In maker_exe.ctl I have: RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker Instructions in the GMOD maker tutorial state: "Unpack the contents of the RepBase tarball into the RepeatMasker/Libraries directory." So, I have RepBase located as follows: /RepeatMasker/4.0.3/x86_64/bin/Libraries/ The content of this directory is: RepBase18.08.embl/ RepBase18.08.fasta/ Could someone tell me how/where maker looks for REPBase and which files (embl? fasta? something else?) I need in there? Many thanks for your help, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 From carsonhh at gmail.com Wed Sep 25 08:13:40 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:13:40 -0400 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just letting you know verbally that you don't have it installed, so you are not surprised by the lack of results RepeatMasker gives you. You must Download RepBase separately from Repeatmasker. When you unpack it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file as well as other files in the .../RepeatMasker/Libraries/ directory. The header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will tell you if it is the minimal library or the full RepBase library. You have also downloaded the incorrect format since you have directories named RepBase18.08.embl. You need to go to http://www.girinst.org/server/RepBase/index.php and download the RepeatMasker edition and not the EMBL format one. The contents should be named exactly .../Libraries/RepeatMaskerLib.embl. Here is a direct link --> http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repea tmaskerlibraries-20130422.tar.gz Make sure you are in the .../RepeatMasker/ directory before unpacking the tar ball, or you won't get the proper file replacement behavior. See Repeatmasker installation instructions here --> http://www.repeatmasker.org/RMDownload.html Thanks, Carson On 9/25/13 7:49 AM, "graham etherington (TSL)" wrote: >Hi, >I'm getting the following error when I run maker v2.28: >WARNING: RepBase is not installed for RepeatMasker. This limits >RepeatMasker's functionality and makes the model_org option in the >control files virtually meaningless. MAKER will now reconfigure >for simple repeat masking only. > > > >In maker_opts.clt I have: >model_org=all >In maker_exe.ctl I have: >RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker > >Instructions in the GMOD maker tutorial state: >"Unpack the contents of the RepBase tarball into the >RepeatMasker/Libraries directory." > > >So, I have RepBase located as follows: > >/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >The content of this directory is: >RepBase18.08.embl/ >RepBase18.08.fasta/ > >Could someone tell me how/where maker looks for REPBase and which files >(embl? fasta? something else?) I need in there? > >Many thanks for your help, >Graham > > >Dr. Graham Etherington >Bioinformatics Support Officer, >The Sainsbury Laboratory, >Norwich Research Park, >Norwich NR4 7UH. >UK >Tel: +44 (0)1603 450601 > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From graham.etherington at sainsbury-laboratory.ac.uk Wed Sep 25 08:29:53 2013 From: graham.etherington at sainsbury-laboratory.ac.uk (graham etherington (TSL)) Date: Wed, 25 Sep 2013 14:29:53 +0000 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: Hi Carson, Many thanks for the explanation of how RepBase works. I followed your instructions and maker no longer complains. Thanks for your help, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 On 25/09/2013 15:13, "Carson Holt" wrote: >It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just >letting you know verbally that you don't have it installed, so you are not >surprised by the lack of results RepeatMasker gives you. > >You must Download RepBase separately from Repeatmasker. When you unpack >it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file >as well as other files in the .../RepeatMasker/Libraries/ directory. The >header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will >tell you if it is the minimal library or the full RepBase library. > >You have also downloaded the incorrect format since you have directories >named RepBase18.08.embl. You need to go to >http://www.girinst.org/server/RepBase/index.php and download the >RepeatMasker edition and not the EMBL format one. The contents should be >named exactly .../Libraries/RepeatMaskerLib.embl. > >Here is a direct link --> >http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repe >a >tmaskerlibraries-20130422.tar.gz > > >Make sure you are in the .../RepeatMasker/ directory before unpacking the >tar ball, or you won't get the proper file replacement behavior. > >See Repeatmasker installation instructions here --> >http://www.repeatmasker.org/RMDownload.html > >Thanks, >Carson > > > >On 9/25/13 7:49 AM, "graham etherington (TSL)" > wrote: > >>Hi, >>I'm getting the following error when I run maker v2.28: >>WARNING: RepBase is not installed for RepeatMasker. This limits >>RepeatMasker's functionality and makes the model_org option in the >>control files virtually meaningless. MAKER will now reconfigure >>for simple repeat masking only. >> >> >> >>In maker_opts.clt I have: >>model_org=all >>In maker_exe.ctl I have: >>RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker >> >>Instructions in the GMOD maker tutorial state: >>"Unpack the contents of the RepBase tarball into the >>RepeatMasker/Libraries directory." >> >> >>So, I have RepBase located as follows: >> >>/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >>The content of this directory is: >>RepBase18.08.embl/ >>RepBase18.08.fasta/ >> >>Could someone tell me how/where maker looks for REPBase and which files >>(embl? fasta? something else?) I need in there? >> >>Many thanks for your help, >>Graham >> >> >>Dr. Graham Etherington >>Bioinformatics Support Officer, >>The Sainsbury Laboratory, >>Norwich Research Park, >>Norwich NR4 7UH. >>UK >>Tel: +44 (0)1603 450601 >> >> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Wed Sep 25 08:32:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:32:33 -0400 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: Glad it worked. If you have any other question, just let us know. Thanks, Carson On 9/25/13 10:29 AM, "graham etherington (TSL)" wrote: >Hi Carson, >Many thanks for the explanation of how RepBase works. I followed your >instructions and maker no longer complains. >Thanks for your help, >Graham > >Dr. Graham Etherington >Bioinformatics Support Officer, >The Sainsbury Laboratory, >Norwich Research Park, >Norwich NR4 7UH. >UK >Tel: +44 (0)1603 450601 > > > > > >On 25/09/2013 15:13, "Carson Holt" wrote: > >>It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just >>letting you know verbally that you don't have it installed, so you are >>not >>surprised by the lack of results RepeatMasker gives you. >> >>You must Download RepBase separately from Repeatmasker. When you unpack >>it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file >>as well as other files in the .../RepeatMasker/Libraries/ directory. The >>header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will >>tell you if it is the minimal library or the full RepBase library. >> >>You have also downloaded the incorrect format since you have directories >>named RepBase18.08.embl. You need to go to >>http://www.girinst.org/server/RepBase/index.php and download the >>RepeatMasker edition and not the EMBL format one. The contents should be >>named exactly .../Libraries/RepeatMaskerLib.embl. >> >>Here is a direct link --> >>http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/rep >>e >>a >>tmaskerlibraries-20130422.tar.gz >> >> >>Make sure you are in the .../RepeatMasker/ directory before unpacking the >>tar ball, or you won't get the proper file replacement behavior. >> >>See Repeatmasker installation instructions here --> >>http://www.repeatmasker.org/RMDownload.html >> >>Thanks, >>Carson >> >> >> >>On 9/25/13 7:49 AM, "graham etherington (TSL)" >> wrote: >> >>>Hi, >>>I'm getting the following error when I run maker v2.28: >>>WARNING: RepBase is not installed for RepeatMasker. This limits >>>RepeatMasker's functionality and makes the model_org option in the >>>control files virtually meaningless. MAKER will now reconfigure >>>for simple repeat masking only. >>> >>> >>> >>>In maker_opts.clt I have: >>>model_org=all >>>In maker_exe.ctl I have: >>>RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker >>> >>>Instructions in the GMOD maker tutorial state: >>>"Unpack the contents of the RepBase tarball into the >>>RepeatMasker/Libraries directory." >>> >>> >>>So, I have RepBase located as follows: >>> >>>/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >>>The content of this directory is: >>>RepBase18.08.embl/ >>>RepBase18.08.fasta/ >>> >>>Could someone tell me how/where maker looks for REPBase and which files >>>(embl? fasta? something else?) I need in there? >>> >>>Many thanks for your help, >>>Graham >>> >>> >>>Dr. Graham Etherington >>>Bioinformatics Support Officer, >>>The Sainsbury Laboratory, >>>Norwich Research Park, >>>Norwich NR4 7UH. >>>UK >>>Tel: +44 (0)1603 450601 >>> >>> >>> >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Wed Sep 25 08:35:46 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:35:46 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: If it is launching predictors then you have snap hmm or augustus_species set. You ned to blank out all other options in the control files (including repeat masking options, proteins, ESTs, etc.) when trying to convert mathc/match_part to gene/mRNA/exons/CDS, or else those other programs will run. --Carson On 9/25/13 10:31 AM, "Xia.Cao at dupont.com" wrote: >Hi Carson, > >Thank you for the message and your kind help. We tested maker2 by setting >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >seemed maker2 started to launch all predictors again and it took long >time to finish. I wonder if there is any way that we can directly get >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >match/match_part features into gene/mRNA/exons/CDS. > >Thanks, >Xia > >-----Original Message----- >From: Carson Holt [mailto:carsonhh at gmail.com] >Sent: Thursday, September 19, 2013 5:58 PM >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >maker-devel at yandell-lab.org >Subject: Re: [maker-devel] maker2 scripts for functional annotation > >Hello Corban & Xia, > >Some scripts like gff3_preds2models are deprecated. To get the same >result as was offered by gff3_preds2models, just give your >match/match_part features to pref_gff= in the maker_opts.ctl file, set >keep_preds=1, and run with all other options and predictors turned off. >The final MAKER result will be your match/match_part features converted >into gene/mRNA/exons/CDS. > >For functional annotation, you can use Interproscan, BLASTP against >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >scripts. Then add putative gene functions via BLASTP to UniProt and >maker_functional_fasta and maker_functional_gff scripts. > >Go ahead and take a look and that those tools and let me know if you have >any questions or need help you configuring them. > >Thanks, >Carson > > >On 9/19/13 11:53 AM, "Mark Yandell" wrote: > >>Hi Corban & Xia, >> >> >>I've forwarded your question along to the MAKER_dev list, were you can >>get speedy answers to your maker related questions. Thanks for using >>MAKER. >> >>--mark >> >> >>Mark Yandell >>Professor of Human Genetics >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>Human Genetics University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>ph:801-587-7707 >> >>________________________________________ >>From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >>Sent: Thursday, September 19, 2013 11:49 AM >>To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >>Subject: maker2 scripts for functional annotation >> >>Dr. Yandell, >> >>We were recently evaluating maker2 for annotation and going through the >>maker tutorial from 2012. >> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >> >>The tutorial makes references to some scripts that we couldn?t find in >>the current release. We were looking for scripts like >>gff3_preds2models to convert match/match_part format into annotations >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>not have the most up to date version. >> >>In addition to getting accurate gene annotations, I was looking for a >>solution to get functional assignments. I see that there are some >>scripts like maker_functional_fasta that may help, but I was wondering >>what you would recommend. >> >>Thanks, >> >>Corban & Xia >> >>This communication is for use by the intended recipient and contains >>information that may be Privileged, confidential or copyrighted under >>applicable law. If you are not the intended recipient, you are hereby >>formally notified that any use, copying or distribution of this e-mail, >>in whole or in part, is strictly prohibited. Please notify the sender >>by return e-mail and delete this e-mail from your system. Unless >>explicitly and conspicuously designated as "E-Contract Intended", this >>e-mail does not constitute a contract offer, a contract amendment, or >>an acceptance of a contract offer. This e-mail does not constitute a >>consent to the use of sender's contact information for direct marketing >>purposes or for transfers of data to third parties. >> >>The dupont.com web address will continue in use for a transitional >>period for communications sent or received on behalf of DuPont >>Performance Coatings., which is not affiliated in any way with the >>DuPont Company. >> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>Korean >> >> http://www.DuPont.com/corp/email_disclaimer.html >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender by >return e-mail and delete this e-mail from your system. Unless explicitly >and conspicuously designated as "E-Contract Intended", this e-mail does >not constitute a contract offer, a contract amendment, or an acceptance >of a contract offer. This e-mail does not constitute a consent to the >use of sender's contact information for direct marketing purposes or for >transfers of data to third parties. > >The dupont.com web address will continue in use for a >transitional period for communications sent or received on behalf of >DuPont >Performance Coatings., which is not affiliated in any way with the DuPont >Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean > > http://www.DuPont.com/corp/email_disclaimer.html > From Xia.Cao at dupont.com Wed Sep 25 08:31:25 2013 From: Xia.Cao at dupont.com (Xia.Cao at dupont.com) Date: Wed, 25 Sep 2013 14:31:25 +0000 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: References: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Message-ID: Hi Carson, Thank you for the message and your kind help. We tested maker2 by setting keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it seemed maker2 started to launch all predictors again and it took long time to finish. I wonder if there is any way that we can directly get gene/mRNA/exons/CDS gff file without re-running maker2 to convert match/match_part features into gene/mRNA/exons/CDS. Thanks, Xia -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, September 19, 2013 5:58 PM To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker2 scripts for functional annotation Hello Corban & Xia, Some scripts like gff3_preds2models are deprecated. To get the same result as was offered by gff3_preds2models, just give your match/match_part features to pref_gff= in the maker_opts.ctl file, set keep_preds=1, and run with all other options and predictors turned off. The final MAKER result will be your match/match_part features converted into gene/mRNA/exons/CDS. For functional annotation, you can use Interproscan, BLASTP against UniProt, or BALST2GO. My preference is to use InterProScan to add GO terms and proteins domains via the ipr_update_gff and iprscan2gff3 scripts. Then add putative gene functions via BLASTP to UniProt and maker_functional_fasta and maker_functional_gff scripts. Go ahead and take a look and that those tools and let me know if you have any questions or need help you configuring them. Thanks, Carson On 9/19/13 11:53 AM, "Mark Yandell" wrote: >Hi Corban & Xia, > > >I've forwarded your question along to the MAKER_dev list, were you can >get speedy answers to your maker related questions. Thanks for using >MAKER. > >--mark > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >Human Genetics University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >Sent: Thursday, September 19, 2013 11:49 AM >To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >Subject: maker2 scripts for functional annotation > >Dr. Yandell, > >We were recently evaluating maker2 for annotation and going through the >maker tutorial from 2012. > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >The tutorial makes references to some scripts that we couldn?t find in >the current release. We were looking for scripts like >gff3_preds2models to convert match/match_part format into annotations >with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >not have the most up to date version. > >In addition to getting accurate gene annotations, I was looking for a >solution to get functional assignments. I see that there are some >scripts like maker_functional_fasta that may help, but I was wondering >what you would recommend. > >Thanks, > >Corban & Xia > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender >by return e-mail and delete this e-mail from your system. Unless >explicitly and conspicuously designated as "E-Contract Intended", this >e-mail does not constitute a contract offer, a contract amendment, or >an acceptance of a contract offer. This e-mail does not constitute a >consent to the use of sender's contact information for direct marketing >purposes or for transfers of data to third parties. > >The dupont.com web address will continue in use for a transitional >period for communications sent or received on behalf of DuPont >Performance Coatings., which is not affiliated in any way with the >DuPont Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese >Korean > > http://www.DuPont.com/corp/email_disclaimer.html > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. The dupont.com web address will continue in use for a transitional period for communications sent or received on behalf of DuPont Performance Coatings., which is not affiliated in any way with the DuPont Company. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From Ambrose.Andongabo at rothamsted.ac.uk Thu Sep 26 05:23:13 2013 From: Ambrose.Andongabo at rothamsted.ac.uk (Ambrose Andongabo (RRes-Roth)) Date: Thu, 26 Sep 2013 11:23:13 +0000 Subject: [maker-devel] Using RNA-seq data from tophat/cufflinks in maker Message-ID: Dear Carson, I have been successfully running the MAKER pipeline trying to improve gene annotations. Strangely after trying to visualize my data in GBrowse I noticed that although my density and coverage plots and even raw read plots show clearly that there is a gene feature in a particular region(confirmed by the cufflinks track), this is not called by MAKER and thus not improving my annotation as I expected. I think the problem starts where I converted the cufflinks gtf files to gff3 using the script you provided(cufflinks2gff3). I will be please if you can be of any help trying to explain how I can perform the conversion so that it looks like a proper gff3 file that maker will then use to instruct the gene predictors Many thanks in advance Ambrose -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. From carsonhh at gmail.com Fri Sep 27 04:48:29 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Sep 2013 06:48:29 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: From: Carson Holt Date: Friday, September 27, 2013 6:42 AM To: Subject: Re: [maker-devel] maker2 scripts for functional annotation If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins). If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.). If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run. Thanks, Carson From: Date: Friday, September 27, 2013 4:34 AM To: Carson Holt Subject: Re: [maker-devel] maker2 scripts for functional annotation Hi... Xia and Carson I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure. The CEGMA output is a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not)) If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source. AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses. Hope this helps and apologies if i'm wrong! On Wednesday, 25 September 2013 15:35:46 UTC+1, Carson Holt wrote: > If it is launching predictors then you have snap hmm or augustus_species > set. You ned to blank out all other options in the control files > (including repeat masking options, proteins, ESTs, etc.) when trying to > convert mathc/match_part to gene/mRNA/exons/CDS, or else those other > programs will run. > > --Carson > > > On 9/25/13 10:31 AM, "Xia... at dupont.com " > wrote: > >> >Hi Carson, >> > >> >Thank you for the message and your kind help. We tested maker2 by setting >> >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >> >seemed maker2 started to launch all predictors again and it took long >> >time to finish. I wonder if there is any way that we can directly get >> >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >> >match/match_part features into gene/mRNA/exons/CDS. >> > >> >Thanks, >> >Xia >> > >> >-----Original Message----- >> >From: Carson Holt [mailto:cars... at gmail.com ] >> >Sent: Thursday, September 19, 2013 5:58 PM >> >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >> >maker... at yandell-lab.org >> >Subject: Re: [maker-devel] maker2 scripts for functional annotation >> > >> >Hello Corban & Xia, >> > >> >Some scripts like gff3_preds2models are deprecated. To get the same >> >result as was offered by gff3_preds2models, just give your >> >match/match_part features to pref_gff= in the maker_opts.ctl file, set >> >keep_preds=1, and run with all other options and predictors turned off. >> >The final MAKER result will be your match/match_part features converted >> >into gene/mRNA/exons/CDS. >> > >> >For functional annotation, you can use Interproscan, BLASTP against >> >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >> >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >> >scripts. Then add putative gene functions via BLASTP to UniProt and >> >maker_functional_fasta and maker_functional_gff scripts. >> > >> >Go ahead and take a look and that those tools and let me know if you have >> >any questions or need help you configuring them. >> > >> >Thanks, >> >Carson >> > >> > >> >On 9/19/13 11:53 AM, "Mark Yandell" >> > wrote: >> > >>> >>Hi Corban & Xia, >>> >> >>> >> >>> >>I've forwarded your question along to the MAKER_dev list, were you can >>> >>get speedy answers to your maker related questions. Thanks for using >>> >>MAKER. >>> >> >>> >>--mark >>> >> >>> >> >>> >>Mark Yandell >>> >>Professor of Human Genetics >>> >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>> >>Human Genetics University of Utah >>> >>15 North 2030 East, Room 2100 >>> >>Salt Lake City, UT 84112-5330 >>> >>ph:801-587-7707 >>> >> >>> >>________________________________________ >>> >>From: Xia... at dupont.com [Xia... at dupont.com ] >>> >>Sent: Thursday, September 19, 2013 11:49 AM >>> >>To: Mark Yandell; Corban-Gre... at dupont.com >>> >>Subject: maker2 scripts for functional annotation >>> >> >>> >>Dr. Yandell, >>> >> >>> >>We were recently evaluating maker2 for annotation and going through the >>> >>maker tutorial from 2012. >>> >> >>> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >>> >> >>> >>The tutorial makes references to some scripts that we couldn?t find in >>> >>the current release. We were looking for scripts like >>> >>gff3_preds2models to convert match/match_part format into annotations >>> >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>> >>not have the most up to date version. >>> >> >>> >>In addition to getting accurate gene annotations, I was looking for a >>> >>solution to get functional assignments. I see that there are some >>> >>scripts like maker_functional_fasta that may help, but I was wondering >>> >>what you would recommend. >>> >> >>> >>Thanks, >>> >> >>> >>Corban & Xia >>> >> >>> >>This communication is for use by the intended recipient and contains >>> >>information that may be Privileged, confidential or copyrighted under >>> >>applicable law. If you are not the intended recipient, you are hereby >>> >>formally notified that any use, copying or distribution of this e-mail, >>> >>in whole or in part, is strictly prohibited. Please notify the sender >>> >>by return e-mail and delete this e-mail from your system. Unless >>> >>explicitly and conspicuously designated as "E-Contract Intended", this >>> >>e-mail does not constitute a contract offer, a contract amendment, or >>> >>an acceptance of a contract offer. This e-mail does not constitute a >>> >>consent to the use of sender's contact information for direct marketing >>> >>purposes or for transfers of data to third parties. >>> >> >>> >>The dupont.com web address will continue in use for a >>> transitional >>> >>period for communications sent or received on behalf of DuPont >>> >>Performance Coatings., which is not affiliated in any way with the >>> >>DuPont Company. >>> >> >>> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>> >>Korean >>> >> >>> >> http://www.DuPont.com/corp/email_disclaimer.html >>> >> >>> >>_______________________________________________ >>> >>maker-devel mailing list >>> >>maker... at box290.bluehost.com >>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > >> > >> > >> >This communication is for use by the intended recipient and contains >> >information that may be Privileged, confidential or copyrighted under >> >applicable law. If you are not the intended recipient, you are hereby >> >formally notified that any use, copying or distribution of this e-mail, >> >in whole or in part, is strictly prohibited. Please notify the sender by >> >return e-mail and delete this e-mail from your system. Unless explicitly >> >and conspicuously designated as "E-Contract Intended", this e-mail does >> >not constitute a contract offer, a contract amendment, or an acceptance >> >of a contract offer. This e-mail does not constitute a consent to the >> >use of sender's contact information for direct marketing purposes or for >> >transfers of data to third parties. >> > >> >The dupont.com web address will >> continue in use for a >> >transitional period for communications sent or received on behalf of >> >DuPont >> >Performance Coatings., which is not affiliated in any way with the DuPont >> >Company. >> > >> >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean >> > >> > http://www.DuPont.com/corp/email_disclaimer.html >> > > > _______________________________________________ > maker-devel mailing list > maker... at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 27 04:48:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Sep 2013 06:48:52 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: So to give a little background to this, the question was how to turn match/match_part into gene/mRNA/exon/CDS like the old gff3_preds2models script. The steps below will basically just turn maker into a feature type converter and ignore all it's other capabilities. That being said, depending on what your final goal is, you might actually want to be running something a different way, but if your only goal is to blindly convert feature types, then those steps will work. Thanks, Carson From: Carson Holt Date: Friday, September 27, 2013 6:42 AM To: Subject: Re: [maker-devel] maker2 scripts for functional annotation If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins). If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.). If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run. Thanks, Carson From: Date: Friday, September 27, 2013 4:34 AM To: Carson Holt Subject: Re: [maker-devel] maker2 scripts for functional annotation Hi... Xia and Carson I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure. The CEGMA output is a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not)) If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source. AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses. Hope this helps and apologies if i'm wrong! On Wednesday, 25 September 2013 15:35:46 UTC+1, Carson Holt wrote: > If it is launching predictors then you have snap hmm or augustus_species > set. You ned to blank out all other options in the control files > (including repeat masking options, proteins, ESTs, etc.) when trying to > convert mathc/match_part to gene/mRNA/exons/CDS, or else those other > programs will run. > > --Carson > > > On 9/25/13 10:31 AM, "Xia... at dupont.com " > wrote: > >> >Hi Carson, >> > >> >Thank you for the message and your kind help. We tested maker2 by setting >> >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >> >seemed maker2 started to launch all predictors again and it took long >> >time to finish. I wonder if there is any way that we can directly get >> >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >> >match/match_part features into gene/mRNA/exons/CDS. >> > >> >Thanks, >> >Xia >> > >> >-----Original Message----- >> >From: Carson Holt [mailto:cars... at gmail.com ] >> >Sent: Thursday, September 19, 2013 5:58 PM >> >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >> >maker... at yandell-lab.org >> >Subject: Re: [maker-devel] maker2 scripts for functional annotation >> > >> >Hello Corban & Xia, >> > >> >Some scripts like gff3_preds2models are deprecated. To get the same >> >result as was offered by gff3_preds2models, just give your >> >match/match_part features to pref_gff= in the maker_opts.ctl file, set >> >keep_preds=1, and run with all other options and predictors turned off. >> >The final MAKER result will be your match/match_part features converted >> >into gene/mRNA/exons/CDS. >> > >> >For functional annotation, you can use Interproscan, BLASTP against >> >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >> >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >> >scripts. Then add putative gene functions via BLASTP to UniProt and >> >maker_functional_fasta and maker_functional_gff scripts. >> > >> >Go ahead and take a look and that those tools and let me know if you have >> >any questions or need help you configuring them. >> > >> >Thanks, >> >Carson >> > >> > >> >On 9/19/13 11:53 AM, "Mark Yandell" >> > wrote: >> > >>> >>Hi Corban & Xia, >>> >> >>> >> >>> >>I've forwarded your question along to the MAKER_dev list, were you can >>> >>get speedy answers to your maker related questions. Thanks for using >>> >>MAKER. >>> >> >>> >>--mark >>> >> >>> >> >>> >>Mark Yandell >>> >>Professor of Human Genetics >>> >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>> >>Human Genetics University of Utah >>> >>15 North 2030 East, Room 2100 >>> >>Salt Lake City, UT 84112-5330 >>> >>ph:801-587-7707 >>> >> >>> >>________________________________________ >>> >>From: Xia... at dupont.com [Xia... at dupont.com ] >>> >>Sent: Thursday, September 19, 2013 11:49 AM >>> >>To: Mark Yandell; Corban-Gre... at dupont.com >>> >>Subject: maker2 scripts for functional annotation >>> >> >>> >>Dr. Yandell, >>> >> >>> >>We were recently evaluating maker2 for annotation and going through the >>> >>maker tutorial from 2012. >>> >> >>> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >>> >> >>> >>The tutorial makes references to some scripts that we couldn?t find in >>> >>the current release. We were looking for scripts like >>> >>gff3_preds2models to convert match/match_part format into annotations >>> >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>> >>not have the most up to date version. >>> >> >>> >>In addition to getting accurate gene annotations, I was looking for a >>> >>solution to get functional assignments. I see that there are some >>> >>scripts like maker_functional_fasta that may help, but I was wondering >>> >>what you would recommend. >>> >> >>> >>Thanks, >>> >> >>> >>Corban & Xia >>> >> >>> >>This communication is for use by the intended recipient and contains >>> >>information that may be Privileged, confidential or copyrighted under >>> >>applicable law. If you are not the intended recipient, you are hereby >>> >>formally notified that any use, copying or distribution of this e-mail, >>> >>in whole or in part, is strictly prohibited. Please notify the sender >>> >>by return e-mail and delete this e-mail from your system. Unless >>> >>explicitly and conspicuously designated as "E-Contract Intended", this >>> >>e-mail does not constitute a contract offer, a contract amendment, or >>> >>an acceptance of a contract offer. This e-mail does not constitute a >>> >>consent to the use of sender's contact information for direct marketing >>> >>purposes or for transfers of data to third parties. >>> >> >>> >>The dupont.com web address will continue in use for a >>> transitional >>> >>period for communications sent or received on behalf of DuPont >>> >>Performance Coatings., which is not affiliated in any way with the >>> >>DuPont Company. >>> >> >>> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>> >>Korean >>> >> >>> >> http://www.DuPont.com/corp/email_disclaimer.html >>> >> >>> >>_______________________________________________ >>> >>maker-devel mailing list >>> >>maker... at box290.bluehost.com >>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > >> > >> > >> >This communication is for use by the intended recipient and contains >> >information that may be Privileged, confidential or copyrighted under >> >applicable law. If you are not the intended recipient, you are hereby >> >formally notified that any use, copying or distribution of this e-mail, >> >in whole or in part, is strictly prohibited. Please notify the sender by >> >return e-mail and delete this e-mail from your system. Unless explicitly >> >and conspicuously designated as "E-Contract Intended", this e-mail does >> >not constitute a contract offer, a contract amendment, or an acceptance >> >of a contract offer. This e-mail does not constitute a consent to the >> >use of sender's contact information for direct marketing purposes or for >> >transfers of data to third parties. >> > >> >The dupont.com web address will >> continue in use for a >> >transitional period for communications sent or received on behalf of >> >DuPont >> >Performance Coatings., which is not affiliated in any way with the DuPont >> >Company. >> > >> >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean >> > >> > http://www.DuPont.com/corp/email_disclaimer.html >> > > > _______________________________________________ > maker-devel mailing list > maker... at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Sun Sep 1 02:17:07 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Sun, 1 Sep 2013 08:17:07 +0000 Subject: [maker-devel] error about DBD::SQLite::db Message-ID: Dear all, When I try to run maker on my test dataset, there is an error like this: DBD::SQLite::db do failed: near ",": syntax error at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 1. DBD::SQLite::db do failed: no such column: JUNC00000001 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 2. DBD::SQLite::db do failed: no such column: JUNC00000002 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 3. DBD::SQLite::db do failed: no such column: JUNC00000003 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 4. DBD::SQLite::db do failed: no such column: JUNC00000004 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 5. DBD::SQLite::db do failed: no such column: JUNC00000005 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 6. DBD::SQLite::db do failed: no such column: JUNC00000006 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 7. DBD::SQLite::db do failed: no such column: JUNC00000007 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 8. DBD::SQLite::db do failed: no such column: JUNC00000008 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 9. DBD::SQLite::db do failed: no such column: JUNC00000009 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 10. DBD::SQLite::db do failed: no such column: JUNC00000010 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 11. DBD::SQLite::db do failed: no such column: JUNC00000011 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 12. DBD::SQLite::db do failed: no such column: JUNC00000012 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 13. DBD::SQLite::db do failed: no such column: JUNC00000013 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 14. DBD::SQLite::db do failed: no such column: JUNC00000014 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 15. DBD::SQLite::db do failed: no such column: JUNC00000015 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 16. DBD::SQLite::db do failed: no such column: JUNC00000016 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 17. DBD::SQLite::db do failed: no such column: JUNC00000017 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 18. The JUN*** is the exteral EST I provide. Can anyone give me some suggestions? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Sep 1 05:26:47 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 01 Sep 2013 07:26:47 -0400 Subject: [maker-devel] error about DBD::SQLite::db In-Reply-To: Message-ID: Most likely an issue with your input files format. Try this GFF3 file validator --> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Also make sure you are using the most recent version of MAKER. --Carson From: Jingjing Jin Date: Sunday, September 1, 2013 4:17 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] error about DBD::SQLite::db Dear all, When I try to run maker on my test dataset, there is an error like this: DBD::SQLite::db do failed: near ",": syntax error at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 1. DBD::SQLite::db do failed: no such column: JUNC00000001 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 2. DBD::SQLite::db do failed: no such column: JUNC00000002 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 3. DBD::SQLite::db do failed: no such column: JUNC00000003 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 4. DBD::SQLite::db do failed: no such column: JUNC00000004 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 5. DBD::SQLite::db do failed: no such column: JUNC00000005 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 6. DBD::SQLite::db do failed: no such column: JUNC00000006 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 7. DBD::SQLite::db do failed: no such column: JUNC00000007 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 8. DBD::SQLite::db do failed: no such column: JUNC00000008 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 9. DBD::SQLite::db do failed: no such column: JUNC00000009 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 10. DBD::SQLite::db do failed: no such column: JUNC00000010 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 11. DBD::SQLite::db do failed: no such column: JUNC00000011 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 12. DBD::SQLite::db do failed: no such column: JUNC00000012 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 13. DBD::SQLite::db do failed: no such column: JUNC00000013 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 14. DBD::SQLite::db do failed: no such column: JUNC00000014 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 15. DBD::SQLite::db do failed: no such column: JUNC00000015 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 16. DBD::SQLite::db do failed: no such column: JUNC00000016 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 17. DBD::SQLite::db do failed: no such column: JUNC00000017 at /data/apps/maker/bin/../lib/GFFDB.pm line 441, <$IN> line 18. The JUN*** is the exteral EST I provide. Can anyone give me some suggestions? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uqslizbe at uq.edu.au Thu Sep 5 01:30:26 2013 From: uqslizbe at uq.edu.au (Selene Lizbeth Fernandez Valverde) Date: Thu, 5 Sep 2013 17:30:26 +1000 Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks Message-ID: Hi all, I'm currently using Maker to reannotate the genome of the marine sponge. We already have a set of Augustus prediction and gene models that I mapped back to the genome using the patched map2assembly script posted on the mailing list, as well as PASA transcripts (based on Trinity assemblies) and cufflinks transcripts. I would like to include both Trinity and Cufflinks, as in some cases one outperforms the other. I'm currently planning to provide the Trinity/PASA assemblies as fasta to the "est" option and the cufflinks assemblies as gff3 using the "est_gff" option but I'm wondering if MAKER will take into account both types of evidence? Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? Thanks in advance for the advice, Selene **est_gff/est --> These are assumed to be correctly assembled and aligned around splice sites (MAKER uses exonerate to align around splice sites for ESTs in FASTA files). MAKER can use them to infer gene models directly (est2genome option), can use them as support for maintaining predictions, and can use them to modify structure and add UTR to predictions. If you let MAKER try and find alternative splice forms, they will be used to identify support for splice variants. How these cluster with other evidence will help MAKER infer gene boundaries in some cases. MAKER will also use splice sites inferred from the ESTs to inform gene predictors during the prediction step. Selene Fernandez-Valverde Ph.D. Postdoctoral Research Fellow School of Biological Sciences University of Queensland St Lucia QLD 4072 Australia uqslizbe at uq.edu.au -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 5 05:04:43 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Sep 2013 07:04:43 -0400 Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks In-Reply-To: Message-ID: 1. I'm wondering if MAKER will take into account both types of evidence? Yes. 2. Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? You can provide them as a comma separated list of files to the est_gff= option, or you can merge them using the gff3_merge script that comes with MAKER. Unfortunately I have no one best option for which evidence types to include. Every evidence type can contribute in it's own way to the final results. When you test using different evidence types, try running on a single large contig and manually view the results in a browser. Thanks, Carson From: Selene Lizbeth Fernandez Valverde Date: Thursday, September 5, 2013 3:30 AM To: Subject: [maker-devel] Maker: Question on using both Trinity and Cufflinks Hi all, I'm currently using Maker to reannotate the genome of the marine sponge. We already have a set of Augustus prediction and gene models that I mapped back to the genome using the patched map2assembly script posted on the mailing list, as well as PASA transcripts (based on Trinity assemblies) and cufflinks transcripts. I would like to include both Trinity and Cufflinks, as in some cases one outperforms the other. I'm currently planning to provide the Trinity/PASA assemblies as fasta to the "est" option and the cufflinks assemblies as gff3 using the "est_gff" option but I'm wondering if MAKER will take into account both types of evidence? Would it be better to merge both PASA and cufflinks gff3s using gff3_merge? Thanks in advance for the advice, Selene **est_gff/est --> These are assumed to be correctly assembled and aligned around splice sites (MAKER uses exonerate to align around splice sites for ESTs in FASTA files). MAKER can use them to infer gene models directly (est2genome option), can use them as support for maintaining predictions, and can use them to modify structure and add UTR to predictions. If you let MAKER try and find alternative splice forms, they will be used to identify support for splice variants. How these cluster with other evidence will help MAKER infer gene boundaries in some cases. MAKER will also use splice sites inferred from the ESTs to inform gene predictors during the prediction step. Selene Fernandez-Valverde Ph.D. Postdoctoral Research Fellow School of Biological Sciences University of Queensland St Lucia QLD 4072 Australia uqslizbe at uq.edu.au _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.zohren at qmul.ac.uk Thu Sep 5 09:58:39 2013 From: j.zohren at qmul.ac.uk (Jasmin Zohren) Date: Thu, 5 Sep 2013 16:58:39 +0100 Subject: [maker-devel] Maker in the cloud Message-ID: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Dear Maker developers, I've already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: 1. To me it seems that there are two different Maker images on EC2 - ami-ea661f83 and ami-b10abed8 - which one is "the right one"? 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn't enough computing power? Or do you have another explanation for this? 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I've also recently heard about a parallelisation module for use in the cloud developed by Era7, called "nispero". But I am not sure whether it is publicly available yet. 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 5 10:26:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Sep 2013 12:26:08 -0400 Subject: [maker-devel] Maker in the cloud In-Reply-To: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Message-ID: Hello Jasmin, I haven't used MAKER in parralel on the cloud before (just tutorial images); however, I believe there is an iPlant atmosphere image available through iPlant with MAKER version 2.27. You can get maximum 16 cpus per instance there. --> http://www.iplantcollaborative.org/discover/atmosphere Alternatively if you have any US based collaborators you can apply for a startup allocation on the Lonestar cluster via XSEDE (allocation can be requested by any US based researcher and only takes a few days to approve) --> https://www.xsede.org/ That cluster was used recently to process the largest genome ever annotated (the pine genome). Total run time will be less than a day on that cluster, because you can request thousands of CPUs for your job with very short queue wait times. There is also a work in progress to give access to MAKER on the same cluster via the iPlant discovery environment. I've CC'd Joshua Stein who can correct me if I'm wrong, but I believe that resource would be available to non-US based researchers as well, and will be available in the very very near future (potentially within the next month or less). Perhaps someone else on the mailing list may want to share their experience using MAKER on the cloud? Thanks, Carson From: Jasmin Zohren Date: Thursday, September 5, 2013 11:58 AM To: Subject: [maker-devel] Maker in the cloud Dear Maker developers, I?ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: 1. To me it seems that there are two different Maker images on EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn?t enough computing power? Or do you have another explanation for this? 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I?ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called ?nispero?. But I am not sure whether it is publicly available yet. 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Thu Sep 5 12:06:05 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 5 Sep 2013 12:06:05 -0600 Subject: [maker-devel] Maker in the cloud In-Reply-To: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> References: <001f01ceaa50$c9b15230$5d13f690$@qmul.ac.uk> Message-ID: Hi Jasmin, Like Carson, my only significant experience with MAKER in the cloud is using it for our training, however, I'll add make some comments based on experience on the cloud with some of our other tools: There are several cloud architectures available now, but I only have experience with Amazon EC2, so all comments are only relevant there. I wouldn't use any of the existing MAKER AMIs. All of them were created for tutorial purposes, and while they should work fine for a real annotation job, they will be out of date. At the very least if you use one, start with it, but install current MAKER code and save it as a new AMI. You can use MPI on the Amazon nodes, but it's not set up by default to run MPI between nodes. That, can presumably be done but we haven't done it, so there may be headaches involved we just don't know for sure. However, you could split your input fasta into several chunks of roughly equal size and fire up a different EC2 node for each fasta file, then allow maker to use MPI to optimize parallelization on each node individually. MAKER is really good at restarting if things fail, so with that in mind I'd suggest staring spot nodes which can be 10X cheaper than regularly priced nodes. Amazon will kill a spot node as soon as someone comes along who is willing to pay full price, so you'd want a way (either manually checking and restarting nodes or scripting a AWS API solution) to check whether nodes finished and restart them if they did not, but you could save a lot of money by doing this. B On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote: > Dear Maker developers, > > I?ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions: > > 1. To me it seems that there are two different Maker images on EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? > 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks? > 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn?t enough computing power? Or do you have another explanation for this? > 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I?ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called ?nispero?. But I am not sure whether it is publicly available yet. > 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well. > > Many thanks in advance and kind regards, > Jasmin > > ----------------------------- > Jasmin Zohren > PhD student in the INTERCROSSING ITN > Queen Mary University of London > > intercrossing.wikispaces.com > evolve.sbcs.qmul.ac.uk > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Fri Sep 6 12:34:32 2013 From: ejr at stowers.org (Ross, Eric) Date: Fri, 6 Sep 2013 18:34:32 +0000 Subject: [maker-devel] maker-devel Digest, Vol 64, Issue 4 In-Reply-To: Message-ID: It wouldn't be too difficult to run MAKER to run using something like starcluster. Starcluster manages the cluster and nodes for you. http://star.mit.edu/cluster/ It's not too difficult to use. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org On 9/6/13 1:00 PM, "maker-devel-request at yandell-lab.org" wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. Re: Maker in the cloud (Barry Moore) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Thu, 5 Sep 2013 12:06:05 -0600 >From: Barry Moore >To: Jasmin Zohren >Cc: maker-devel at yandell-lab.org >Subject: Re: [maker-devel] Maker in the cloud >Message-ID: >Content-Type: text/plain; charset="windows-1252" > >Hi Jasmin, > >Like Carson, my only significant experience with MAKER in the cloud is >using it for our training, however, I'll add make some comments based on >experience on the cloud with some of our other tools: > >There are several cloud architectures available now, but I only have >experience with Amazon EC2, so all comments are only relevant there. > >I wouldn't use any of the existing MAKER AMIs. All of them were created >for tutorial purposes, and while they should work fine for a real >annotation job, they will be out of date. At the very least if you use >one, start with it, but install current MAKER code and save it as a new >AMI. You can use MPI on the Amazon nodes, but it's not set up by default >to run MPI between nodes. That, can presumably be done but we haven't >done it, so there may be headaches involved we just don't know for sure. >However, you could split your input fasta into several chunks of roughly >equal size and fire up a different EC2 node for each fasta file, then >allow maker to use MPI to optimize parallelization on each node >individually. MAKER is really good at restarting if things fail, so with >that in mind I'd suggest staring spot nodes which can be 10X cheaper than >regularly priced nodes. Amazon will kill a spot node as soon as someone >comes along who is willing to pay full price, so you'd want a way (either >manually checking and restarting nodes or scripting a AWS API solution) >to check whether nodes finished and restart them if they did not, but you >could save a lot of money by doing this. > >B > >On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote: > >> Dear Maker developers, >> >> I?ve already contacted you a while ago about my annotation of the birch >>genome (Betula nana). As I am constantly running into problems using our >>cluster facilities at QMUL I thought of moving into the cloud. As I am >>rather inexperienced in cloud computing I have several questions: >> >> 1. To me it seems that there are two different Maker images on >>EC2 ? ami-ea661f83 and ami-b10abed8 ? which one is ?the right one?? >> 2. Can I use this Maker AMI for the annotation of a whole genome >>or is it only suitable for the tutorial tasks? >> 3. Also, when I followed the steps outlined in the tutorial, >>there seemed to be a problem with RepeatMasker. Although Maker would run >>and produce output files, the log file stated that the contig had failed >>after the second attempt. I launched the image on a T1.micro instance, >>maybe that wasn?t enough computing power? Or do you have another >>explanation for this? >> 4. Would it be possible to run the annotation in parallel (e.g. >>using MPICH2) in the cloud? I?ve also recently heard about a >>parallelisation module for use in the cloud developed by Era7, called >>?nispero?. But I am not sure whether it is publicly available yet. >> 5. Do you have any experience of how long an annotation task in >>the cloud would take and also what the expected costs would be? The >>birch genome is only 500 MB in size and currently I am simply annotating >>it with a SNAP trained HMM. However, in the future I will feed it with >>RNAseq data as well. >> >> Many thanks in advance and kind regards, >> Jasmin >> >> >> ----------------------------- >> Jasmin Zohren >> PhD student in the INTERCROSSING ITN >> Queen Mary University of London >> >> intercrossing.wikispaces.com >> evolve.sbcs.qmul.ac.uk >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >Barry Moore >Research Scientist >Dept. of Human Genetics >University of Utah >Salt Lake City, UT 84112 >-------------------------------------------- >(801) 585-3543 > > > > >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: >nts/20130905/bf35206e/attachment-0001.html> > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 64, Issue 4 >****************************************** From bhall7 at hawaii.edu Wed Sep 11 14:23:28 2013 From: bhall7 at hawaii.edu (Brian Hall) Date: Wed, 11 Sep 2013 10:23:28 -1000 Subject: [maker-devel] Question about phase for CDS with start codon Message-ID: <5230D140.7010804@hawaii.edu> Aloha, I'm working with a gff produced by maker. (I didn't run the program myself, but I believe it was version 2.24.) Here are the lines in question: scaffold00033 maker CDS 729494 729949 . - 2 ID=107343;Name=BDOR_005037-RC:cds:250;Parent=107334 scaffold00033 maker start_codon 729947 729949 . - . ID=107349;Name=BDOR_005037-RB:start1;Parent=107334 If I understand correctly, the start codon in this reverse-strand CDS is from position 729949 to 729947 -- the first three bases in the CDS. However, the phase value for the CDS is 2, which essentially skips the start codon. Downstream software (tbl2asn) is kicking up a "missing start codon" error. I have several hundred such issues in the gff for a single genome. They generally only occur on reverse-strand CDSs. Any ideas? Sincerest apologies if this is a duplicate question or if I've provided incomplete information. I am new at this. Thanks for your help! --Brian From ckuanglim at gmail.com Wed Sep 11 23:42:38 2013 From: ckuanglim at gmail.com (Chan Kuang Lim) Date: Thu, 12 Sep 2013 13:42:38 +0800 Subject: [maker-devel] Exon Type in MAKER GFF Output Message-ID: Dear Maker developers, I have a question regarding the GFF output of MAKER. When we look at CDS and Exon, we do not know whether they are initial, internal, terminal or single. How can we capture the exon type from MAKER output? Thanks, Chan KL -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 12 08:21:48 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Sep 2013 10:21:48 -0400 Subject: [maker-devel] Exon Type in MAKER GFF Output In-Reply-To: Message-ID: That information is not explicit in GFF3 format. You have to capture all exons parented onto the mRNA, then sort them to identify if the exon is 5-prime, 3-prime, internal, or single exon. --Carson From: Chan Kuang Lim Date: Thursday, September 12, 2013 1:42 AM To: Subject: [maker-devel] Exon Type in MAKER GFF Output Dear Maker developers, I have a question regarding the GFF output of MAKER. When we look at CDS and Exon, we do not know whether they are initial, internal, terminal or single. How can we capture the exon type from MAKER output? Thanks, Chan KL _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 12 09:27:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Sep 2013 11:27:44 -0400 Subject: [maker-devel] Question about phase for CDS with start codon In-Reply-To: <5230D140.7010804@hawaii.edu> Message-ID: I know there was an incorrect phase issue on a previous maker version that is now fixed, but I really doubt that is the issue causing your error. What are you using to convert from GFF3 to tbl format before using tbl2asn? I'd start there. we can send you a GFF3 to tbl converter if that will help. --Carson On 9/11/13 4:23 PM, "Brian Hall" wrote: >Aloha, > >I'm working with a gff produced by maker. (I didn't run the program >myself, but I believe it was version 2.24.) Here are the lines in >question: > >scaffold00033 maker CDS 729494 729949 . - 2 >ID=107343;Name=BDOR_005037-RC:cds:250;Parent=107334 >scaffold00033 maker start_codon 729947 729949 . - . >ID=107349;Name=BDOR_005037-RB:start1;Parent=107334 > >If I understand correctly, the start codon in this reverse-strand CDS is >from position 729949 to 729947 -- the first three bases in the CDS. >However, the phase value for the CDS is 2, which essentially skips the >start codon. Downstream software (tbl2asn) is kicking up a "missing >start codon" error. > >I have several hundred such issues in the gff for a single genome. They >generally only occur on reverse-strand CDSs. Any ideas? > >Sincerest apologies if this is a duplicate question or if I've provided >incomplete information. I am new at this. Thanks for your help! > >--Brian > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Fri Sep 13 02:15:29 2013 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Fri, 13 Sep 2013 10:15:29 +0200 Subject: [maker-devel] Maker pass-through behavior Message-ID: <5232C9A1.3060709@imbim.uu.se> Dear list, I have started using Maker to explore its use for a number of genome projects we are planning on running. One of the tools we intend on incorporating into our pipeline is PASA (Since we will be using Trinity etc). The (cleaned) output with predicted gene structures I would like to pass to Maker as pass-through annotation (I am optimistic that way...) - but I noticed that doing so does not always result in the incorporation of the PASA gene model into the final maker annotation track. Sometimes it seems to be superseded by an Augustus/Maker model, sometimes the region stays empty (even tho a protein alignment is present). So my question is how Maker handles pass-throughs, exactly. Can it reject pass-throughs, or should it always use such models over any other data source? Is there any scenario were it wouldn't? I understand that Maker uses some internal scoring system to estimate the accuracy of an annotation - could that be a reason? It would be a bit odd tho, since a lift-over from chicken (to our bird genome) seems to support gene models produced by PASA, yet they are nowhere to be found in the final models. And a related question: Is there a comprehensive documentation where I can get more information on the internal decision making process of Maker? Or do I have to dig into the code for that? Cheers, Marc PS I have attached a screenshot of such an example - the green track is Maker with proteins + augustus (chicken models) + PASA pass-through of a cleaned-up gene structure file. (Orange: Cleaned ORFs directly from PASA output, Grey: PASA ORFs without cleaning, Dark red: Maker with proteins and trinity transcripts as EST evidence, Black: chicken lift-overs from EnsEMBL) -------------- next part -------------- A non-text attachment was scrubbed... Name: igv_snapshot.png Type: image/png Size: 50142 bytes Desc: not available URL: From carsonhh at gmail.com Sun Sep 15 12:39:29 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 15 Sep 2013 14:39:29 -0400 Subject: [maker-devel] Maker pass-through behavior In-Reply-To: <5232C9A1.3060709@imbim.uu.se> Message-ID: > So my question is how Maker handles pass-throughs, exactly. Can it > reject pass-throughs, or should it always use such models over any other > data source? Is there any scenario were it wouldn't? pred_gff is treated the same as any other ab initio prediction. It is just one among several candidate gene models. The model that is kept is the one with the lowest AED score (lower means better evidence match/support). Any model with no evidence support or AED=1 will be rejected (no evidence support) unless keep_preds=1 is set. There is also another score eAED which takes into account protein reading frame (protein evidence must be in same reading frame as the gene model). If eAED =1 it will also cause models to be rejected. > I understand that Maker uses some internal scoring system to estimate > the accuracy of an annotation - could that be a reason? Possibly. Look at the AED score of the pass-through model in the final MAKER GFF3 to see what the AED score was. If you want to send me GFF3 to look at with a list of regions you are concerned about I can tell you more. Also consider giving PASA results to est_gff as well to bias the scoring algorithm to maintain those models (I.e. Model supports itself, which is reasonable since these are EST derived anyways and not just ab initio predictions). Also the model_gff option will always keep an input model (with or without evidence support) and will only replace it with something else if that something else has a better AED score. > > > > And a related question: Is there a comprehensive documentation where I > can get more information on the internal decision making process of > Maker? Or do I have to dig into the code for that? Look at these two papers --> Holt, C., and Yandell, M. (2011). MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491. Eilbeck, K., Moore, B., Holt, C., and Yandell, M. (2009). Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67. Thanks, Carson On 9/13/13 4:15 AM, "Marc P. Hoeppner" wrote: > Dear list, > > I have started using Maker to explore its use for a number of genome > projects we are planning on running. One of the tools we intend on > incorporating into our pipeline is PASA (Since we will be using Trinity > etc). The (cleaned) output with predicted gene structures I would like > to pass to Maker as pass-through annotation (I am optimistic that > way...) - but I noticed that doing so does not always result in the > incorporation of the PASA gene model into the final maker annotation > track. Sometimes it seems to be superseded by an Augustus/Maker model, > sometimes the region stays empty (even tho a protein alignment is present). > > So my question is how Maker handles pass-throughs, exactly. Can it > reject pass-throughs, or should it always use such models over any other > data source? Is there any scenario were it wouldn't? > > I understand that Maker uses some internal scoring system to estimate > the accuracy of an annotation - could that be a reason? It would be a > bit odd tho, since a lift-over from chicken (to our bird genome) seems > to support gene models produced by PASA, yet they are nowhere to be > found in the final models. > > And a related question: Is there a comprehensive documentation where I > can get more information on the internal decision making process of > Maker? Or do I have to dig into the code for that? > > Cheers, > > Marc > > PS I have attached a screenshot of such an example - the green track is > Maker with proteins + augustus (chicken models) + PASA pass-through of a > cleaned-up gene structure file. (Orange: Cleaned ORFs directly from PASA > output, Grey: PASA ORFs without cleaning, Dark red: Maker with proteins > and trinity transcripts as EST evidence, Black: chicken lift-overs from > EnsEMBL) > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhinsley at ebi.ac.uk Mon Sep 16 03:51:35 2013 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Mon, 16 Sep 2013 10:51:35 +0100 Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes In-Reply-To: <5232C9A1.3060709@imbim.uu.se> References: <5232C9A1.3060709@imbim.uu.se> Message-ID: <5236D4A7.6080303@ebi.ac.uk> Hello I'm trying to get maker to run on MPI using several nodes. I have an installation set up by a colleague which includes maker 2.27 and openmpi-1.4.3. Previously it has only been used (here at EBI) with maker processes running on one node only, but i find that it can wait a very long time before being scheduled by LSF. The command used to submit is like this (as per recommendations from systems) (uses 8 cpus on each of 8 nodes) |export OMP_NUM_THREADS=||64| |bsub -q mpi -M ||40000| |-R ||"rusage[mem=40000] && span[ptile=8]"| |-n ||64| |-o lsf_log -a openmpi mpirun.lsf -np ||64| |-mca btl tcp,self maker ||2||>&||1| and requires environment be set up in ~/.bashrc for openMPI. This runs but produces a lot of errors like: DBD::SQLite::db do failed: database is locked at /nfs/production/panda/ensemblgenomes/external/maker/2.27_mpi/maker/bin/../lib/GFFDB.pm line 407. I've looked at https://groups.google.com/forum/#!searchin/maker-devel/database$20locked/maker-devel/TscBgbQfBX4/pae016DqlIMJ which suggests that "It means that your GFF3 results will not be integrated" (but i'm not sure what's meant by that, but the number of genes i'm getting is around 2k, expect more like 15k) and that the problem is SQLite using NFS (a known issue), and the fix is to use /tmp. I have TMP= set as per default in maker_opts.ctl, and there are maker directories in /tmp on the runtime nodes, but the database (i guess) is in /nfs/...../maker//.scf.db. I don't see how i could set the working directory to a non-NFS file systems and still use more than one node, but this error only seems to appear (so far) with est2genome, not when running SNAP/ Augustus. Is there a work around to stop getting the locked error or some way to recover from it after maker has finished? Or is it necessary to run the est2genome step (or maker generally) on one node? An obvious option is to split the assembly but i was hoping to avoid that. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 17 21:35:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Sep 2013 21:35:52 -0600 Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes In-Reply-To: <5236D4A7.6080303@ebi.ac.uk> Message-ID: Sorry for the slow reply, I'm currently traveling. Try deleting any *.db diles in the maker output directory to force the SQLite database to be rebuilt. Also you can try the current version of MAKER at yandell-lab.org. MAKER is supposed to try and copy the database to the /tmp directory before it starts work. That way the actual working copy will be local, and will be independent for each node. --Carson From: Malcolm Hinsley Date: Monday, September 16, 2013 3:51 AM To: Subject: [maker-devel] SQLite database locked error, maker MPI using several nodes Hello I'm trying to get maker to run on MPI using several nodes. I have an installation set up by a colleague which includes maker 2.27 and openmpi-1.4.3. Previously it has only been used (here at EBI) with maker processes running on one node only, but i find that it can wait a very long time before being scheduled by LSF. The command used to submit is like this (as per recommendations from systems) (uses 8 cpus on each of 8 nodes) export OMP_NUM_THREADS=64 bsub -q mpi -M 40000 -R "rusage[mem=40000] && span[ptile=8]" -n 64 -o lsf_log -a openmpi mpirun.lsf -np 64 -mca btl tcp,self maker 2>&1 and requires environment be set up in ~/.bashrc for openMPI. This runs but produces a lot of errors like: DBD::SQLite::db do failed: database is locked at /nfs/production/panda/ensemblgenomes/external/maker/2.27_mpi/maker/bin/../li b/GFFDB.pm line 407. I've looked at https://groups.google.com/forum/#!searchin/maker-devel/database$20locked/mak er-devel/TscBgbQfBX4/pae016DqlIMJ which suggests that "It means that your GFF3 results will not be integrated" (but i'm not sure what's meant by that, but the number of genes i'm getting is around 2k, expect more like 15k) and that the problem is SQLite using NFS (a known issue), and the fix is to use /tmp. I have TMP= set as per default in maker_opts.ctl, and there are maker directories in /tmp on the runtime nodes, but the database (i guess) is in /nfs/...../maker//.scf.db. I don't see how i could set the working directory to a non-NFS file systems and still use more than one node, but this error only seems to appear (so far) with est2genome, not when running SNAP/ Augustus. Is there a work around to stop getting the locked error or some way to recover from it after maker has finished? Or is it necessary to run the est2genome step (or maker generally) on one node? An obvious option is to split the assembly but i was hoping to avoid that. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 17 21:57:12 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Sep 2013 21:57:12 -0600 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: It does sound like this is likely the result of gene fusion from the trinity assemblies. One thing to look at is the number of coding exons compared to the other ant species. See if the increase in exons is mostly in UTR, coding sequence, or both. One thing you could try is running MAKER without the EST evidence, just to see how many genes you get with protein only support. There are ways to use multiple MAKER runs to tease out details of the data. For example: run1: protein evidence only plus ab initio predators like snap and augustus. run2: protein and EST evidence. Models from run1 passed in as pred_gff with snap and augustus turned off (this will force the addition of UTR, but not the generation of new models). Use the correct_est_fusion=1 option here to clip UTR that runs into neighboring genes. run3: protein and EST evidence plus augusuts and snap. Then take models fromrun2 and models from run3 that do not overlap run2 and add them all to your final set along with any models that come from interproscan domain analysis of rejected models. This solution is rather lengthy, but may avoid many of the problems you seem to be getting with gene merging even with jaccard_clip and correct_est_fusion turned on, because your ESTs would only contribute to the UTR and to models not found based solely on protein evidence (I.e. They would be ignored in cases where you get enough evidence from other sources). --Carson From: Benjamin Rubin Date: Tuesday, September 17, 2013 10:08 AM To: Carson Holt Subject: Re: [maker-devel] Unexpected results with correct_est_fusion Hi Carson, The new version is working great. Thanks for your help. I do have another more general question. I am working on annotating a new ant genome (Pseudomyrmex gracilis) and the results that I am getting from MAKER are a bit unexpected. The number of genes produced by MAKER is ~14,300 while, as you may know, the seven published ant genomes have at least 16,000 genes (this number was improved by several hundred by turning on correct_est_fusion). Running the ab initio predictions through InterProScan yields ~900 additional genes for P. gracilis so there are still substantially fewer genes found for this species. This difference on its own is not that unexpected; Pseudomyrmex likely diverged from the other sequenced ants by over 100 million years and the genome sequence itself is rather fragmented and incomplete. However, what is bothering me is that, despite having fewer genes, I am seeing substantially larger numbers of exons (~92,000 as opposed to 78-85,000) and the total length of all proteins is more than a million amino acids longer in P. gracilis. It does not have unexpectedly long genes, the average gene length is just a bit higher. I have looked at the annotations of some conserved genes and found some apparently spurious exons merged with these genes. I say that they are spurious because they go beyond the end of the gene sequence in other species (ants and Drosophila). Unfortunately, it appears that many of these spurious calls are primarily the result of blast hits to my EST data. The ESTs generally seem to blast to the genome a bit more often than expected. Partly as a result of the relatively high repeat content of my genome (~50% complex repeats) and partly because we only used two Illumina libraries, my genome sequence is quite fragmented (~280Mb in ~6,500 scaffolds). Note that the total genome length is estimated at 387Mb, so I am missing a fair amount but almost all CEGMA genes are present in the assembly so I have concluded that the missing sequence is predominantly repeats. I have no prior reason to expect that my EST library has anything wrong with it. I did a single Illumina lane of RNA-seq and assembled in Trinity with the jaccard_clip option on to reduce gene fusions. If you have any advice on how my gene predictions can be improved, I would really appreciate it. Have you heard of this kind of problem before? Is there a way to limit the influence of ESTs without discarding them entirely? Thanks so much for your help with the fusion bug and for any advice here. Ben On Wed, Sep 11, 2013 at 9:27 AM, Benjamin Rubin wrote: > Hi Carson, > > OK, I will try it and let you know how it goes. And thanks for the suggestion > about using always_complete as well. > > Thanks! > Ben > > > On Tue, Sep 10, 2013 at 9:45 PM, Carson Holt wrote: >> I think I have it fixed. Sorry it took so long, but my original fix actually >> created other odd behaviors so I had to track those down as well. >> >> You can download the test version with the fix by typing this on the command >> line --> >> >> svn co ********* >> >> user: ***** >> password: ***** >> >> Test it out and let me know. On the contig you sent me, I also set >> always_complete=1 as some of the hint based models were lacking start or stop >> codons. The results looked slightly better that way as well. >> >> Thanks, >> Carson >> >> >> >> From: Benjamin Rubin >> Date: Wednesday, September 4, 2013 10:07 AM >> To: Carson Holt >> >> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >> >> OK, great. Thanks for letting me know. >> >> Ben >> >> >> On Wed, Sep 4, 2013 at 9:00 AM, Carson Holt wrote: >>> I thought I'd give you an update on this. I've verified the bug and think >>> I've identified roughly where it's happening. I'll have a fix for you to >>> test soon. >>> >>> --Carson >>> >>> >>> From: Benjamin Rubin >>> >>> Date: Wednesday, August 28, 2013 4:16 PM >>> To: Carson Holt >>> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >>> >>> Hi Carson, >>> >>> OK, I think I uploaded all of the necessary files. I made a directory named >>> "rubin_data" for everything. I included both the full genome file >>> ("ec_patch...") as well as a file for scaffold_1. For this scaffold, I get >>> 132 genes when correct_est_fusion is off and 35 when it is on. These results >>> are after running maker a first time with correct_est_fusion on and >>> retraining SNAP/Augustus on the results. The SNAP file is >>> "gracilis_round_1.hmm" and I think the necessary Augustus files are in the >>> "gracilis_jaccard_flank100_corrfusion_round_1_results" directory. I also >>> included gff files for scaffold_1 with and without correct_est_fusion turned >>> on. >>> >>> Let me know if there is anything else that I failed to upload. I really >>> appreciate your time. Thanks so much. >>> >>> Ben >>> >>> >>> On Wed, Aug 28, 2013 at 9:59 AM, Benjamin Rubin >>> wrote: >>>> Hi Carson, >>>> >>>> Yes, I would be happy to upload the necessary data. Just let me know the >>>> connection information. >>>> >>>> Thanks! >>>> Ben >>>> >>>> >>>> On Wed, Aug 28, 2013 at 8:09 AM, Carson Holt wrote: >>>>> Could you pick one contig where the number of genes shift dramatically and >>>>> upload that contig fasta together with your control files and any evidence >>>>> datasets used to one of our servers (I'm going to send you connection >>>>> details in a separate e-mail). I can then run with and without >>>>> correct_est_fusion to see if there is anything unexpected going on. >>>>> >>>>> --Carson >>>>> >>>>> >>>>> >>>>> From: Benjamin Rubin >>>>> Date: Tuesday, August 27, 2013 10:59 AM >>>>> To: Carson Holt >>>>> Cc: >>>>> Subject: Re: [maker-devel] Unexpected results with correct_est_fusion >>>>> >>>>> Hi Carson, >>>>> >>>>> I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but >>>>> I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank >>>>> at 100). This is using only the first round with SNAP and Augustus trained >>>>> on the CEGMA genes. Is there anything else that I might be doing wrong? I >>>>> have attached my control file in case that could be useful. >>>>> >>>>> Thanks for the help! >>>>> Ben >>>>> >>>>> >>>>> On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: >>>>>> The correct_est_fusion option just clips UTR on overlapping genes. I >>>>>> suspect the real problem is setting pred_flank too low. If your lead in >>>>>> sequence to a gene is too short, ab initio predictors won't call it. So >>>>>> you are probably getting empty reports from SNAP/Augustus for the hint >>>>>> based predictions. Try increasing pred_flank to at least 150. Setting >>>>>> pred_flank too low will also limit how far MAKER will walk out along the >>>>>> edges initial alignments during the polishing step (exonerate). So >>>>>> setting it too low may also be causing you to lose some EST and protein >>>>>> alignments. >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> From: Benjamin Rubin >>>>>> Date: Monday, August 26, 2013 2:20 PM >>>>>> To: >>>>>> Subject: [maker-devel] Unexpected results with correct_est_fusion >>>>>> >>>>>> Hello developers, >>>>>> >>>>>> I am using MAKER 2.28 to annotate an ant genome. I provide protein >>>>>> sequence evidence from all seven of the other sequenced ant genomes and a >>>>>> de novo assembled transcriptome as EST evidence. I assembled the >>>>>> transcriptome using Trinity with the jaccard_clip option turned on to >>>>>> reduce gene fusions. Despite using this set of hopefully non-fused ESTs, >>>>>> I still have substantial fusion problems with the final annotation. >>>>>> Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. >>>>>> However, correct_est_fusion leads to the prediction of a much smaller >>>>>> number of genes (~5,000 instead of ~14,000). I am initially training both >>>>>> SNAP and Augustus using CEGMA genes and then retraining based on the >>>>>> first round of annotation. Both rounds of annotation yield the same low >>>>>> number (~5,000) of genes. It may also be worth mentioning that the number >>>>>> of exons is also far lower when using correct_est_fusion (~26,000 instead >>>>>> of ~90,000). >>>>>> >>>>>> Is this the expected behavior of correct_est_fusion? I was surprised that >>>>>> it reduced the predicted number of genes by such a large margin. I am >>>>>> concerned that I am using it incorrectly. Do you have any other >>>>>> suggestions for reducing gene merging? >>>>>> >>>>>> Thanks, >>>>>> Ben >>>>>> >>>>>> -- >>>>>> _____________________________________________________ >>>>>> Benjamin ER Rubin >>>>>> PhD Candidate >>>>>> Committee on Evolutionary Biology >>>>>> University of Chicago >>>>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>>>> >>>>>> Division of Insects >>>>>> Zoology Department >>>>>> Field Museum of Natural History >>>>>> 1400 South Lake Shore Drive >>>>>> Chicago, IL 60605 >>>>>> USA >>>>>> Office: (312) 665-7776 >>>>>> _______________________________________________ maker-devel mailing list >>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinf >>>>>> o/maker-devel_yandell-lab.org >>>>> >>>>> >>>>> >>>>> -- >>>>> _____________________________________________________ >>>>> Benjamin ER Rubin >>>>> PhD Candidate >>>>> Committee on Evolutionary Biology >>>>> University of Chicago >>>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>>> >>>>> Division of Insects >>>>> Zoology Department >>>>> Field Museum of Natural History >>>>> 1400 South Lake Shore Drive >>>>> Chicago, IL 60605 >>>>> USA >>>>> Office: (312) 665-7776 >>>> >>>> >>>> >>>> -- >>>> _____________________________________________________ >>>> Benjamin ER Rubin >>>> PhD Candidate >>>> Committee on Evolutionary Biology >>>> University of Chicago >>>> http://www.moreaulab.org/Benjamin_Rubin.html >>>> >>>> Division of Insects >>>> Zoology Department >>>> Field Museum of Natural History >>>> 1400 South Lake Shore Drive >>>> Chicago, IL 60605 >>>> USA >>>> Office: (312) 665-7776 >>> >>> >>> >>> -- >>> _____________________________________________________ >>> Benjamin ER Rubin >>> PhD Candidate >>> Committee on Evolutionary Biology >>> University of Chicago >>> http://www.moreaulab.org/Benjamin_Rubin.html >>> >>> Division of Insects >>> Zoology Department >>> Field Museum of Natural History >>> 1400 South Lake Shore Drive >>> Chicago, IL 60605 >>> USA >>> Office: (312) 665-7776 >> >> >> >> -- >> _____________________________________________________ >> Benjamin ER Rubin >> PhD Candidate >> Committee on Evolutionary Biology >> University of Chicago >> benrubin.org >> >> Division of Insects >> Zoology Department >> Field Museum of Natural History >> 1400 South Lake Shore Drive >> Chicago, IL 60605 >> USA >> Office: (312) 665-7776 > > > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > benrubin.org > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From leshin at gmail.com Wed Sep 18 13:35:10 2013 From: leshin at gmail.com (Le-Shin Wu) Date: Wed, 18 Sep 2013 15:35:10 -0400 Subject: [maker-devel] running mpi MAKER Message-ID: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Hi, I am new to MAKER and just started to use MAKER for doing some genome annotations. I compiled MAKER package with mpi-supported configuration on our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER job, I got whole bunch of warring message as shown below in my error log file. I wonder is there anything wrong with this warring message? Thank you. (I request 64 processors on two nodes) STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. Best LW From carsonhh at gmail.com Wed Sep 18 14:27:32 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Sep 2013 14:27:32 -0600 Subject: [maker-devel] running mpi MAKER In-Reply-To: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Message-ID: It means either maker as not properly configured for MPI support, or the communication ring is not launching properly. Three things: 1. In the .../maker/src/ directory, run './Build status'. Does it say MPI_SUPPORT is configured or installed? 2. Run 'which mpiexec' on the command line? What is the path? Is is MPICH2 mpiexec, or OpenMPI, or something else? 3. Run 'mpiexec -n 64 -hostfile $PBS_NODEFILE hostname' on the command line. What does it print out? Thanks, Carson On 9/18/13 1:35 PM, "Le-Shin Wu" wrote: >Hi, > >I am new to MAKER and just started to use MAKER for doing some genome >annotations. I compiled MAKER package with mpi-supported configuration on >our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker >maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER >job, I got whole bunch of warring message as shown below in my error log >file. I wonder is there anything wrong with this warring message? Thank >you. (I request 64 processors on two nodes) > >STATUS: Processing and indexing input FASTA files... >WARNING: Multiple MAKER processes have been started in the >same directory. > > >Best > >LW >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From lewu at indiana.edu Wed Sep 18 19:30:49 2013 From: lewu at indiana.edu (Le-shin Wu) Date: Wed, 18 Sep 2013 21:30:49 -0400 Subject: [maker-devel] running mpi MAKER In-Reply-To: References: <9C12174B-285F-4777-ADA9-141A2493D97F@gmail.com> Message-ID: Hi Carson, Thanks a lot for your information. When I run './Build status', it shows as below and looks like MPI SUPPORT is enabled. ============================================================================== STATUS MAKER 2.27 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: ENABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK But when I run 'which mpiexec' it shows "/N/soft/mason/openmpi/1.5.4/gcc/bin/mpiexec". So I think I did not use the correct version of mpiexec while running my MAKER job. Thanks again. I will try my MAKER job again with the correct mpiexec from mpich2. Best LW ____________________________________________ Le-Shin Wu Center for Computational Cytomics, Indiana University http://www.cs.indiana.edu/~lewu ____________________________________________ On Wed, Sep 18, 2013 at 4:27 PM, Carson Holt wrote: > It means either maker as not properly configured for MPI support, or the > communication ring is not launching properly. > > Three things: > 1. In the .../maker/src/ directory, run './Build status'. Does it say > MPI_SUPPORT is configured or installed? > 2. Run 'which mpiexec' on the command line? What is the path? Is is > MPICH2 mpiexec, or OpenMPI, or something else? > 3. Run 'mpiexec -n 64 -hostfile $PBS_NODEFILE hostname' on the command > line. What does it print out? > > Thanks, > Carson > > > On 9/18/13 1:35 PM, "Le-Shin Wu" wrote: > > >Hi, > > > >I am new to MAKER and just started to use MAKER for doing some genome > >annotations. I compiled MAKER package with mpi-supported configuration on > >our cluster. But when I used "mpiexec -n 64 -hostfile $PBS_NODEFILE maker > >maker_opts.ctl maker_bopts.ctl maker_exe.ctl" command to run my MPI MAKER > >job, I got whole bunch of warring message as shown below in my error log > >file. I wonder is there anything wrong with this warring message? Thank > >you. (I request 64 processors on two nodes) > > > >STATUS: Processing and indexing input FASTA files... > >WARNING: Multiple MAKER processes have been started in the > >same directory. > > > > > >Best > > > >LW > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhinsley at ebi.ac.uk Thu Sep 19 09:37:17 2013 From: mhinsley at ebi.ac.uk (Malcolm Hinsley) Date: Thu, 19 Sep 2013 16:37:17 +0100 Subject: [maker-devel] 2.27 and 2.28 incompatible Message-ID: <523B1A2D.7020300@ebi.ac.uk> To try to fix SQL lock file errors I installed 2.28 and made the mistake of running on a directory made by 2.27 (to run snap and augustus for the first time). Every contig fails due to errors like: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Can't open file 04723bc0c22478764d90bbaebca96d23 STACK: Error::throw STACK: Bio::Root::Root::throw /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/Root/Root.pm:472 STACK: Bio::DB::Fasta::fh /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:948 STACK: Bio::DB::Fasta::subseq /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:929 STACK: Bio::PrimarySeq::Fasta::seq /nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.pm:1089 STACK: FastaSeq::seq /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/FastaSeq.pm:50 STACK: Process::MpiChunk::_go /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:478 STACK: Process::MpiChunk::run /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiTiers.pm:286 STACK: /nfs/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/maker:667 ----------------------------------------------------------- --> rank=NA, hostname=ebi3-198.ebi.ac.uk at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Error.pm line 38 Error::_throw_Error_Simple('HASH(0x388cb78)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../ lib/Error.pm line 306 Error::subs::run_clauses('HASH(0x388cbf0)', '\x{a}------------- EXCEPTION: Bio::Root::Exception -------------\x{a}...', undef, 'ARRAY(0x38a0d18)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x38f93f8)', 'HASH(0x388cbf0)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich 2/bin/../lib/FastaSeq.pm line 95 FastaSeq::seq('FastaSeq=HASH(0x388dda0)') called at /nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib/ Process/MpiChunk.pm line 478 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x38a0e50)', 'run', 'HASH(0x38a0ec8)', 0, 0) called at /nfs/production/panda/ens emblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run('Process::MpiChunk=HASH(0x38a0e50)', 0) called at /nfs/production/panda/ensemblgenomes/external/maker/2. 28_mpich2/bin/../lib/Process/MpiChunk.pm line 357 Process::MpiChunk::run_all('Process::MpiChunk=HASH(0x38a0e50)', 0) called at /nfs/production/panda/ensemblgenomes/external/make r/2.28_mpich2/bin/../lib/Process/MpiTiers.pm line 286 Process::MpiTiers::run_all('Process::MpiTiers=HASH(0x3867960)', 0) called at /nfs/panda/ensemblgenomes/external/maker/2.28_mpic h2/bin/maker line 667 Is there an easy to reset the datastore/ file names so that i can switch over to 2.28 without starting over? (eg maker -dsindex) I killed the job and ran 2.27 instead which seems to be jim dandy. -- malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom From carsonhh at gmail.com Thu Sep 19 10:06:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Sep 2013 10:06:09 -0600 Subject: [maker-devel] 2.27 and 2.28 incompatible In-Reply-To: <523B1A2D.7020300@ebi.ac.uk> Message-ID: There is something very odd, because I've never seen those errors before, and 2.28 should use the same datastore structure as 2.27. I'm going to write a script that will print out certain configuration information about your install that might help me see what's going on. My plane is boarding now, so I'll send it to you later this evening. Thanks, Carson On 9/19/13 9:37 AM, "Malcolm Hinsley" wrote: >To try to fix SQL lock file errors I installed 2.28 and made the mistake >of running on a directory made by 2.27 (to run snap and augustus for the >first time). > >Every contig fails due to errors like: > > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Can't open file 04723bc0c22478764d90bbaebca96d23 >STACK: Error::throw >STACK: Bio::Root::Root::throw >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/Root/Root. >pm:472 >STACK: Bio::DB::Fasta::fh >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:948 >STACK: Bio::DB::Fasta::subseq >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:929 >STACK: Bio::PrimarySeq::Fasta::seq >/nfs/panda/ensemblgenomes/external/bioperl/BioPerl-1.6.901//Bio/DB/Fasta.p >m:1089 >STACK: FastaSeq::seq >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/FastaSeq.pm:50 >STACK: Process::MpiChunk::_go >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:478 >STACK: Process::MpiChunk::run >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:341 >STACK: Process::MpiChunk::run_all >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiChunk.pm:357 >STACK: Process::MpiTiers::run_all >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Process/MpiTiers.pm:286 >STACK: /nfs/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/maker:667 >----------------------------------------------------------- >--> rank=NA, hostname=ebi3-198.ebi.ac.uk > at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Error.pm >line 38 > Error::_throw_Error_Simple('HASH(0x388cb78)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../ >lib/Error.pm line 306 > Error::subs::run_clauses('HASH(0x388cbf0)', '\x{a}------------- >EXCEPTION: Bio::Root::Exception -------------\x{a}...', undef, >'ARRAY(0x38a0d18)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/Error.pm >line 426 > Error::subs::try('CODE(0x38f93f8)', 'HASH(0x388cbf0)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich >2/bin/../lib/FastaSeq.pm line 95 > FastaSeq::seq('FastaSeq=HASH(0x388dda0)') called at >/nfs/production/panda/ensemblgenomes/external/maker/2.28_mpich2/bin/../lib >/ >Process/MpiChunk.pm line 478 > Process::MpiChunk::_go('Process::MpiChunk=HASH(0x38a0e50)', >'run', 'HASH(0x38a0ec8)', 0, 0) called at /nfs/production/panda/ens >emblgenomes/external/maker/2.28_mpich2/bin/../lib/Process/MpiChunk.pm >line 341 > Process::MpiChunk::run('Process::MpiChunk=HASH(0x38a0e50)', 0) >called at /nfs/production/panda/ensemblgenomes/external/maker/2. >28_mpich2/bin/../lib/Process/MpiChunk.pm line 357 > Process::MpiChunk::run_all('Process::MpiChunk=HASH(0x38a0e50)', >0) called at /nfs/production/panda/ensemblgenomes/external/make >r/2.28_mpich2/bin/../lib/Process/MpiTiers.pm line 286 > Process::MpiTiers::run_all('Process::MpiTiers=HASH(0x3867960)', >0) called at /nfs/panda/ensemblgenomes/external/maker/2.28_mpic >h2/bin/maker line 667 > >Is there an easy to reset the datastore/ file names so that i can switch >over to 2.28 without starting over? (eg maker -dsindex) >I killed the job and ran 2.27 instead which seems to be jim dandy. > >-- >malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669 >European Bioinformatics Institute (EMBL-EBI) >European Molecular Biology Laboratory >Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD >United Kingdom > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From myandell at genetics.utah.edu Thu Sep 19 11:53:48 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Thu, 19 Sep 2013 17:53:48 +0000 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: References: Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Hi Corban & Xia, I've forwarded your question along to the MAKER_dev list, were you can get speedy answers to your maker related questions. Thanks for using MAKER. --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] Sent: Thursday, September 19, 2013 11:49 AM To: Mark Yandell; Corban-Gregory.Rivera at dupont.com Subject: maker2 scripts for functional annotation Dr. Yandell, We were recently evaluating maker2 for annotation and going through the maker tutorial from 2012. http://gmod.org/wiki/MAKER_Tutorial_2012 The tutorial makes references to some scripts that we couldn?t find in the current release. We were looking for scripts like gff3_preds2models to convert match/match_part format into annotations with gene/mRNA/exons/CDS and others. I was wondering if maybe we did not have the most up to date version. In addition to getting accurate gene annotations, I was looking for a solution to get functional assignments. I see that there are some scripts like maker_functional_fasta that may help, but I was wondering what you would recommend. Thanks, Corban & Xia This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. The dupont.com web address will continue in use for a transitional period for communications sent or received on behalf of DuPont Performance Coatings., which is not affiliated in any way with the DuPont Company. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From carsonhh at gmail.com Thu Sep 19 15:58:16 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Sep 2013 15:58:16 -0600 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Message-ID: Hello Corban & Xia, Some scripts like gff3_preds2models are deprecated. To get the same result as was offered by gff3_preds2models, just give your match/match_part features to pref_gff= in the maker_opts.ctl file, set keep_preds=1, and run with all other options and predictors turned off. The final MAKER result will be your match/match_part features converted into gene/mRNA/exons/CDS. For functional annotation, you can use Interproscan, BLASTP against UniProt, or BALST2GO. My preference is to use InterProScan to add GO terms and proteins domains via the ipr_update_gff and iprscan2gff3 scripts. Then add putative gene functions via BLASTP to UniProt and maker_functional_fasta and maker_functional_gff scripts. Go ahead and take a look and that those tools and let me know if you have any questions or need help you configuring them. Thanks, Carson On 9/19/13 11:53 AM, "Mark Yandell" wrote: >Hi Corban & Xia, > > >I've forwarded your question along to the MAKER_dev list, were you can >get speedy answers to your maker related questions. Thanks for using >MAKER. > >--mark > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >Sent: Thursday, September 19, 2013 11:49 AM >To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >Subject: maker2 scripts for functional annotation > >Dr. Yandell, > >We were recently evaluating maker2 for annotation and going through the >maker tutorial from 2012. > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >The tutorial makes references to some scripts that we couldn?t find in >the current release. We were looking for scripts like gff3_preds2models >to convert match/match_part format into annotations with >gene/mRNA/exons/CDS and others. I was wondering if maybe we did not have >the most up to date version. > >In addition to getting accurate gene annotations, I was looking for a >solution to get functional assignments. I see that there are some >scripts like maker_functional_fasta that may help, but I was wondering >what you would recommend. > >Thanks, > >Corban & Xia > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender by >return e-mail and delete this e-mail from your system. Unless explicitly >and conspicuously designated as "E-Contract Intended", this e-mail does >not constitute a contract offer, a contract amendment, or an acceptance >of a contract offer. This e-mail does not constitute a consent to the >use of sender's contact information for direct marketing purposes or for >transfers of data to third parties. > >The dupont.com web address will continue in use for a >transitional period for communications sent or received on behalf of >DuPont >Performance Coatings., which is not affiliated in any way with the DuPont >Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean > > http://www.DuPont.com/corp/email_disclaimer.html > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From graham.etherington at sainsbury-laboratory.ac.uk Wed Sep 25 05:49:40 2013 From: graham.etherington at sainsbury-laboratory.ac.uk (graham etherington (TSL)) Date: Wed, 25 Sep 2013 11:49:40 +0000 Subject: [maker-devel] Path and contents of RepBase Message-ID: Hi, I'm getting the following error when I run maker v2.28: WARNING: RepBase is not installed for RepeatMasker. This limits RepeatMasker's functionality and makes the model_org option in the control files virtually meaningless. MAKER will now reconfigure for simple repeat masking only. In maker_opts.clt I have: model_org=all In maker_exe.ctl I have: RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker Instructions in the GMOD maker tutorial state: "Unpack the contents of the RepBase tarball into the RepeatMasker/Libraries directory." So, I have RepBase located as follows: /RepeatMasker/4.0.3/x86_64/bin/Libraries/ The content of this directory is: RepBase18.08.embl/ RepBase18.08.fasta/ Could someone tell me how/where maker looks for REPBase and which files (embl? fasta? something else?) I need in there? Many thanks for your help, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 From carsonhh at gmail.com Wed Sep 25 08:13:40 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:13:40 -0400 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just letting you know verbally that you don't have it installed, so you are not surprised by the lack of results RepeatMasker gives you. You must Download RepBase separately from Repeatmasker. When you unpack it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file as well as other files in the .../RepeatMasker/Libraries/ directory. The header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will tell you if it is the minimal library or the full RepBase library. You have also downloaded the incorrect format since you have directories named RepBase18.08.embl. You need to go to http://www.girinst.org/server/RepBase/index.php and download the RepeatMasker edition and not the EMBL format one. The contents should be named exactly .../Libraries/RepeatMaskerLib.embl. Here is a direct link --> http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repea tmaskerlibraries-20130422.tar.gz Make sure you are in the .../RepeatMasker/ directory before unpacking the tar ball, or you won't get the proper file replacement behavior. See Repeatmasker installation instructions here --> http://www.repeatmasker.org/RMDownload.html Thanks, Carson On 9/25/13 7:49 AM, "graham etherington (TSL)" wrote: >Hi, >I'm getting the following error when I run maker v2.28: >WARNING: RepBase is not installed for RepeatMasker. This limits >RepeatMasker's functionality and makes the model_org option in the >control files virtually meaningless. MAKER will now reconfigure >for simple repeat masking only. > > > >In maker_opts.clt I have: >model_org=all >In maker_exe.ctl I have: >RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker > >Instructions in the GMOD maker tutorial state: >"Unpack the contents of the RepBase tarball into the >RepeatMasker/Libraries directory." > > >So, I have RepBase located as follows: > >/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >The content of this directory is: >RepBase18.08.embl/ >RepBase18.08.fasta/ > >Could someone tell me how/where maker looks for REPBase and which files >(embl? fasta? something else?) I need in there? > >Many thanks for your help, >Graham > > >Dr. Graham Etherington >Bioinformatics Support Officer, >The Sainsbury Laboratory, >Norwich Research Park, >Norwich NR4 7UH. >UK >Tel: +44 (0)1603 450601 > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From graham.etherington at sainsbury-laboratory.ac.uk Wed Sep 25 08:29:53 2013 From: graham.etherington at sainsbury-laboratory.ac.uk (graham etherington (TSL)) Date: Wed, 25 Sep 2013 14:29:53 +0000 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: Hi Carson, Many thanks for the explanation of how RepBase works. I followed your instructions and maker no longer complains. Thanks for your help, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 On 25/09/2013 15:13, "Carson Holt" wrote: >It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just >letting you know verbally that you don't have it installed, so you are not >surprised by the lack of results RepeatMasker gives you. > >You must Download RepBase separately from Repeatmasker. When you unpack >it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file >as well as other files in the .../RepeatMasker/Libraries/ directory. The >header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will >tell you if it is the minimal library or the full RepBase library. > >You have also downloaded the incorrect format since you have directories >named RepBase18.08.embl. You need to go to >http://www.girinst.org/server/RepBase/index.php and download the >RepeatMasker edition and not the EMBL format one. The contents should be >named exactly .../Libraries/RepeatMaskerLib.embl. > >Here is a direct link --> >http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repe >a >tmaskerlibraries-20130422.tar.gz > > >Make sure you are in the .../RepeatMasker/ directory before unpacking the >tar ball, or you won't get the proper file replacement behavior. > >See Repeatmasker installation instructions here --> >http://www.repeatmasker.org/RMDownload.html > >Thanks, >Carson > > > >On 9/25/13 7:49 AM, "graham etherington (TSL)" > wrote: > >>Hi, >>I'm getting the following error when I run maker v2.28: >>WARNING: RepBase is not installed for RepeatMasker. This limits >>RepeatMasker's functionality and makes the model_org option in the >>control files virtually meaningless. MAKER will now reconfigure >>for simple repeat masking only. >> >> >> >>In maker_opts.clt I have: >>model_org=all >>In maker_exe.ctl I have: >>RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker >> >>Instructions in the GMOD maker tutorial state: >>"Unpack the contents of the RepBase tarball into the >>RepeatMasker/Libraries directory." >> >> >>So, I have RepBase located as follows: >> >>/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >>The content of this directory is: >>RepBase18.08.embl/ >>RepBase18.08.fasta/ >> >>Could someone tell me how/where maker looks for REPBase and which files >>(embl? fasta? something else?) I need in there? >> >>Many thanks for your help, >>Graham >> >> >>Dr. Graham Etherington >>Bioinformatics Support Officer, >>The Sainsbury Laboratory, >>Norwich Research Park, >>Norwich NR4 7UH. >>UK >>Tel: +44 (0)1603 450601 >> >> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Wed Sep 25 08:32:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:32:33 -0400 Subject: [maker-devel] Path and contents of RepBase In-Reply-To: Message-ID: Glad it worked. If you have any other question, just let us know. Thanks, Carson On 9/25/13 10:29 AM, "graham etherington (TSL)" wrote: >Hi Carson, >Many thanks for the explanation of how RepBase works. I followed your >instructions and maker no longer complains. >Thanks for your help, >Graham > >Dr. Graham Etherington >Bioinformatics Support Officer, >The Sainsbury Laboratory, >Norwich Research Park, >Norwich NR4 7UH. >UK >Tel: +44 (0)1603 450601 > > > > > >On 25/09/2013 15:13, "Carson Holt" wrote: > >>It's not MAKER that looks for RepBase, it is Repeatmasker. MAKER is just >>letting you know verbally that you don't have it installed, so you are >>not >>surprised by the lack of results RepeatMasker gives you. >> >>You must Download RepBase separately from Repeatmasker. When you unpack >>it, it replaces the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file >>as well as other files in the .../RepeatMasker/Libraries/ directory. The >>header of the .../RepeatMasker/Libraries/RepeatMaskerLib.embl file will >>tell you if it is the minimal library or the full RepBase library. >> >>You have also downloaded the incorrect format since you have directories >>named RepBase18.08.embl. You need to go to >>http://www.girinst.org/server/RepBase/index.php and download the >>RepeatMasker edition and not the EMBL format one. The contents should be >>named exactly .../Libraries/RepeatMaskerLib.embl. >> >>Here is a direct link --> >>http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/rep >>e >>a >>tmaskerlibraries-20130422.tar.gz >> >> >>Make sure you are in the .../RepeatMasker/ directory before unpacking the >>tar ball, or you won't get the proper file replacement behavior. >> >>See Repeatmasker installation instructions here --> >>http://www.repeatmasker.org/RMDownload.html >> >>Thanks, >>Carson >> >> >> >>On 9/25/13 7:49 AM, "graham etherington (TSL)" >> wrote: >> >>>Hi, >>>I'm getting the following error when I run maker v2.28: >>>WARNING: RepBase is not installed for RepeatMasker. This limits >>>RepeatMasker's functionality and makes the model_org option in the >>>control files virtually meaningless. MAKER will now reconfigure >>>for simple repeat masking only. >>> >>> >>> >>>In maker_opts.clt I have: >>>model_org=all >>>In maker_exe.ctl I have: >>>RepeatMasker=/RepeatMasker/4.0.3/x86_64/bin/RepeatMasker >>> >>>Instructions in the GMOD maker tutorial state: >>>"Unpack the contents of the RepBase tarball into the >>>RepeatMasker/Libraries directory." >>> >>> >>>So, I have RepBase located as follows: >>> >>>/RepeatMasker/4.0.3/x86_64/bin/Libraries/ >>>The content of this directory is: >>>RepBase18.08.embl/ >>>RepBase18.08.fasta/ >>> >>>Could someone tell me how/where maker looks for REPBase and which files >>>(embl? fasta? something else?) I need in there? >>> >>>Many thanks for your help, >>>Graham >>> >>> >>>Dr. Graham Etherington >>>Bioinformatics Support Officer, >>>The Sainsbury Laboratory, >>>Norwich Research Park, >>>Norwich NR4 7UH. >>>UK >>>Tel: +44 (0)1603 450601 >>> >>> >>> >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Wed Sep 25 08:35:46 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Sep 2013 10:35:46 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: If it is launching predictors then you have snap hmm or augustus_species set. You ned to blank out all other options in the control files (including repeat masking options, proteins, ESTs, etc.) when trying to convert mathc/match_part to gene/mRNA/exons/CDS, or else those other programs will run. --Carson On 9/25/13 10:31 AM, "Xia.Cao at dupont.com" wrote: >Hi Carson, > >Thank you for the message and your kind help. We tested maker2 by setting >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >seemed maker2 started to launch all predictors again and it took long >time to finish. I wonder if there is any way that we can directly get >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >match/match_part features into gene/mRNA/exons/CDS. > >Thanks, >Xia > >-----Original Message----- >From: Carson Holt [mailto:carsonhh at gmail.com] >Sent: Thursday, September 19, 2013 5:58 PM >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >maker-devel at yandell-lab.org >Subject: Re: [maker-devel] maker2 scripts for functional annotation > >Hello Corban & Xia, > >Some scripts like gff3_preds2models are deprecated. To get the same >result as was offered by gff3_preds2models, just give your >match/match_part features to pref_gff= in the maker_opts.ctl file, set >keep_preds=1, and run with all other options and predictors turned off. >The final MAKER result will be your match/match_part features converted >into gene/mRNA/exons/CDS. > >For functional annotation, you can use Interproscan, BLASTP against >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >scripts. Then add putative gene functions via BLASTP to UniProt and >maker_functional_fasta and maker_functional_gff scripts. > >Go ahead and take a look and that those tools and let me know if you have >any questions or need help you configuring them. > >Thanks, >Carson > > >On 9/19/13 11:53 AM, "Mark Yandell" wrote: > >>Hi Corban & Xia, >> >> >>I've forwarded your question along to the MAKER_dev list, were you can >>get speedy answers to your maker related questions. Thanks for using >>MAKER. >> >>--mark >> >> >>Mark Yandell >>Professor of Human Genetics >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>Human Genetics University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>ph:801-587-7707 >> >>________________________________________ >>From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >>Sent: Thursday, September 19, 2013 11:49 AM >>To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >>Subject: maker2 scripts for functional annotation >> >>Dr. Yandell, >> >>We were recently evaluating maker2 for annotation and going through the >>maker tutorial from 2012. >> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >> >>The tutorial makes references to some scripts that we couldn?t find in >>the current release. We were looking for scripts like >>gff3_preds2models to convert match/match_part format into annotations >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>not have the most up to date version. >> >>In addition to getting accurate gene annotations, I was looking for a >>solution to get functional assignments. I see that there are some >>scripts like maker_functional_fasta that may help, but I was wondering >>what you would recommend. >> >>Thanks, >> >>Corban & Xia >> >>This communication is for use by the intended recipient and contains >>information that may be Privileged, confidential or copyrighted under >>applicable law. If you are not the intended recipient, you are hereby >>formally notified that any use, copying or distribution of this e-mail, >>in whole or in part, is strictly prohibited. Please notify the sender >>by return e-mail and delete this e-mail from your system. Unless >>explicitly and conspicuously designated as "E-Contract Intended", this >>e-mail does not constitute a contract offer, a contract amendment, or >>an acceptance of a contract offer. This e-mail does not constitute a >>consent to the use of sender's contact information for direct marketing >>purposes or for transfers of data to third parties. >> >>The dupont.com web address will continue in use for a transitional >>period for communications sent or received on behalf of DuPont >>Performance Coatings., which is not affiliated in any way with the >>DuPont Company. >> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>Korean >> >> http://www.DuPont.com/corp/email_disclaimer.html >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender by >return e-mail and delete this e-mail from your system. Unless explicitly >and conspicuously designated as "E-Contract Intended", this e-mail does >not constitute a contract offer, a contract amendment, or an acceptance >of a contract offer. This e-mail does not constitute a consent to the >use of sender's contact information for direct marketing purposes or for >transfers of data to third parties. > >The dupont.com web address will continue in use for a >transitional period for communications sent or received on behalf of >DuPont >Performance Coatings., which is not affiliated in any way with the DuPont >Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean > > http://www.DuPont.com/corp/email_disclaimer.html > From Xia.Cao at dupont.com Wed Sep 25 08:31:25 2013 From: Xia.Cao at dupont.com (Xia.Cao at dupont.com) Date: Wed, 25 Sep 2013 14:31:25 +0000 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: References: <7A60AB257EFF2B48B1F4C814817EA05365E583D7@mxb2.hg.genetics.utah.edu> Message-ID: Hi Carson, Thank you for the message and your kind help. We tested maker2 by setting keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it seemed maker2 started to launch all predictors again and it took long time to finish. I wonder if there is any way that we can directly get gene/mRNA/exons/CDS gff file without re-running maker2 to convert match/match_part features into gene/mRNA/exons/CDS. Thanks, Xia -----Original Message----- From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, September 19, 2013 5:58 PM To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker2 scripts for functional annotation Hello Corban & Xia, Some scripts like gff3_preds2models are deprecated. To get the same result as was offered by gff3_preds2models, just give your match/match_part features to pref_gff= in the maker_opts.ctl file, set keep_preds=1, and run with all other options and predictors turned off. The final MAKER result will be your match/match_part features converted into gene/mRNA/exons/CDS. For functional annotation, you can use Interproscan, BLASTP against UniProt, or BALST2GO. My preference is to use InterProScan to add GO terms and proteins domains via the ipr_update_gff and iprscan2gff3 scripts. Then add putative gene functions via BLASTP to UniProt and maker_functional_fasta and maker_functional_gff scripts. Go ahead and take a look and that those tools and let me know if you have any questions or need help you configuring them. Thanks, Carson On 9/19/13 11:53 AM, "Mark Yandell" wrote: >Hi Corban & Xia, > > >I've forwarded your question along to the MAKER_dev list, were you can >get speedy answers to your maker related questions. Thanks for using >MAKER. > >--mark > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >Human Genetics University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Xia.Cao at dupont.com [Xia.Cao at dupont.com] >Sent: Thursday, September 19, 2013 11:49 AM >To: Mark Yandell; Corban-Gregory.Rivera at dupont.com >Subject: maker2 scripts for functional annotation > >Dr. Yandell, > >We were recently evaluating maker2 for annotation and going through the >maker tutorial from 2012. > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >The tutorial makes references to some scripts that we couldn?t find in >the current release. We were looking for scripts like >gff3_preds2models to convert match/match_part format into annotations >with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >not have the most up to date version. > >In addition to getting accurate gene annotations, I was looking for a >solution to get functional assignments. I see that there are some >scripts like maker_functional_fasta that may help, but I was wondering >what you would recommend. > >Thanks, > >Corban & Xia > >This communication is for use by the intended recipient and contains >information that may be Privileged, confidential or copyrighted under >applicable law. If you are not the intended recipient, you are hereby >formally notified that any use, copying or distribution of this e-mail, >in whole or in part, is strictly prohibited. Please notify the sender >by return e-mail and delete this e-mail from your system. Unless >explicitly and conspicuously designated as "E-Contract Intended", this >e-mail does not constitute a contract offer, a contract amendment, or >an acceptance of a contract offer. This e-mail does not constitute a >consent to the use of sender's contact information for direct marketing >purposes or for transfers of data to third parties. > >The dupont.com web address will continue in use for a transitional >period for communications sent or received on behalf of DuPont >Performance Coatings., which is not affiliated in any way with the >DuPont Company. > >Francais Deutsch Italiano Espanol Portugues Japanese Chinese >Korean > > http://www.DuPont.com/corp/email_disclaimer.html > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. The dupont.com web address will continue in use for a transitional period for communications sent or received on behalf of DuPont Performance Coatings., which is not affiliated in any way with the DuPont Company. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From Ambrose.Andongabo at rothamsted.ac.uk Thu Sep 26 05:23:13 2013 From: Ambrose.Andongabo at rothamsted.ac.uk (Ambrose Andongabo (RRes-Roth)) Date: Thu, 26 Sep 2013 11:23:13 +0000 Subject: [maker-devel] Using RNA-seq data from tophat/cufflinks in maker Message-ID: Dear Carson, I have been successfully running the MAKER pipeline trying to improve gene annotations. Strangely after trying to visualize my data in GBrowse I noticed that although my density and coverage plots and even raw read plots show clearly that there is a gene feature in a particular region(confirmed by the cufflinks track), this is not called by MAKER and thus not improving my annotation as I expected. I think the problem starts where I converted the cufflinks gtf files to gff3 using the script you provided(cufflinks2gff3). I will be please if you can be of any help trying to explain how I can perform the conversion so that it looks like a proper gff3 file that maker will then use to instruct the gene predictors Many thanks in advance Ambrose -- This message has been scanned for viruses and dangerous content by MailScanner, and we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning. From carsonhh at gmail.com Fri Sep 27 04:48:29 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Sep 2013 06:48:29 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: From: Carson Holt Date: Friday, September 27, 2013 6:42 AM To: Subject: Re: [maker-devel] maker2 scripts for functional annotation If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins). If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.). If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run. Thanks, Carson From: Date: Friday, September 27, 2013 4:34 AM To: Carson Holt Subject: Re: [maker-devel] maker2 scripts for functional annotation Hi... Xia and Carson I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure. The CEGMA output is a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not)) If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source. AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses. Hope this helps and apologies if i'm wrong! On Wednesday, 25 September 2013 15:35:46 UTC+1, Carson Holt wrote: > If it is launching predictors then you have snap hmm or augustus_species > set. You ned to blank out all other options in the control files > (including repeat masking options, proteins, ESTs, etc.) when trying to > convert mathc/match_part to gene/mRNA/exons/CDS, or else those other > programs will run. > > --Carson > > > On 9/25/13 10:31 AM, "Xia... at dupont.com " > wrote: > >> >Hi Carson, >> > >> >Thank you for the message and your kind help. We tested maker2 by setting >> >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >> >seemed maker2 started to launch all predictors again and it took long >> >time to finish. I wonder if there is any way that we can directly get >> >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >> >match/match_part features into gene/mRNA/exons/CDS. >> > >> >Thanks, >> >Xia >> > >> >-----Original Message----- >> >From: Carson Holt [mailto:cars... at gmail.com ] >> >Sent: Thursday, September 19, 2013 5:58 PM >> >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >> >maker... at yandell-lab.org >> >Subject: Re: [maker-devel] maker2 scripts for functional annotation >> > >> >Hello Corban & Xia, >> > >> >Some scripts like gff3_preds2models are deprecated. To get the same >> >result as was offered by gff3_preds2models, just give your >> >match/match_part features to pref_gff= in the maker_opts.ctl file, set >> >keep_preds=1, and run with all other options and predictors turned off. >> >The final MAKER result will be your match/match_part features converted >> >into gene/mRNA/exons/CDS. >> > >> >For functional annotation, you can use Interproscan, BLASTP against >> >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >> >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >> >scripts. Then add putative gene functions via BLASTP to UniProt and >> >maker_functional_fasta and maker_functional_gff scripts. >> > >> >Go ahead and take a look and that those tools and let me know if you have >> >any questions or need help you configuring them. >> > >> >Thanks, >> >Carson >> > >> > >> >On 9/19/13 11:53 AM, "Mark Yandell" >> > wrote: >> > >>> >>Hi Corban & Xia, >>> >> >>> >> >>> >>I've forwarded your question along to the MAKER_dev list, were you can >>> >>get speedy answers to your maker related questions. Thanks for using >>> >>MAKER. >>> >> >>> >>--mark >>> >> >>> >> >>> >>Mark Yandell >>> >>Professor of Human Genetics >>> >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>> >>Human Genetics University of Utah >>> >>15 North 2030 East, Room 2100 >>> >>Salt Lake City, UT 84112-5330 >>> >>ph:801-587-7707 >>> >> >>> >>________________________________________ >>> >>From: Xia... at dupont.com [Xia... at dupont.com ] >>> >>Sent: Thursday, September 19, 2013 11:49 AM >>> >>To: Mark Yandell; Corban-Gre... at dupont.com >>> >>Subject: maker2 scripts for functional annotation >>> >> >>> >>Dr. Yandell, >>> >> >>> >>We were recently evaluating maker2 for annotation and going through the >>> >>maker tutorial from 2012. >>> >> >>> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >>> >> >>> >>The tutorial makes references to some scripts that we couldn?t find in >>> >>the current release. We were looking for scripts like >>> >>gff3_preds2models to convert match/match_part format into annotations >>> >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>> >>not have the most up to date version. >>> >> >>> >>In addition to getting accurate gene annotations, I was looking for a >>> >>solution to get functional assignments. I see that there are some >>> >>scripts like maker_functional_fasta that may help, but I was wondering >>> >>what you would recommend. >>> >> >>> >>Thanks, >>> >> >>> >>Corban & Xia >>> >> >>> >>This communication is for use by the intended recipient and contains >>> >>information that may be Privileged, confidential or copyrighted under >>> >>applicable law. If you are not the intended recipient, you are hereby >>> >>formally notified that any use, copying or distribution of this e-mail, >>> >>in whole or in part, is strictly prohibited. Please notify the sender >>> >>by return e-mail and delete this e-mail from your system. Unless >>> >>explicitly and conspicuously designated as "E-Contract Intended", this >>> >>e-mail does not constitute a contract offer, a contract amendment, or >>> >>an acceptance of a contract offer. This e-mail does not constitute a >>> >>consent to the use of sender's contact information for direct marketing >>> >>purposes or for transfers of data to third parties. >>> >> >>> >>The dupont.com web address will continue in use for a >>> transitional >>> >>period for communications sent or received on behalf of DuPont >>> >>Performance Coatings., which is not affiliated in any way with the >>> >>DuPont Company. >>> >> >>> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>> >>Korean >>> >> >>> >> http://www.DuPont.com/corp/email_disclaimer.html >>> >> >>> >>_______________________________________________ >>> >>maker-devel mailing list >>> >>maker... at box290.bluehost.com >>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > >> > >> > >> >This communication is for use by the intended recipient and contains >> >information that may be Privileged, confidential or copyrighted under >> >applicable law. If you are not the intended recipient, you are hereby >> >formally notified that any use, copying or distribution of this e-mail, >> >in whole or in part, is strictly prohibited. Please notify the sender by >> >return e-mail and delete this e-mail from your system. Unless explicitly >> >and conspicuously designated as "E-Contract Intended", this e-mail does >> >not constitute a contract offer, a contract amendment, or an acceptance >> >of a contract offer. This e-mail does not constitute a consent to the >> >use of sender's contact information for direct marketing purposes or for >> >transfers of data to third parties. >> > >> >The dupont.com web address will >> continue in use for a >> >transitional period for communications sent or received on behalf of >> >DuPont >> >Performance Coatings., which is not affiliated in any way with the DuPont >> >Company. >> > >> >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean >> > >> > http://www.DuPont.com/corp/email_disclaimer.html >> > > > _______________________________________________ > maker-devel mailing list > maker... at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 27 04:48:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Sep 2013 06:48:52 -0400 Subject: [maker-devel] maker2 scripts for functional annotation In-Reply-To: Message-ID: So to give a little background to this, the question was how to turn match/match_part into gene/mRNA/exon/CDS like the old gff3_preds2models script. The steps below will basically just turn maker into a feature type converter and ignore all it's other capabilities. That being said, depending on what your final goal is, you might actually want to be running something a different way, but if your only goal is to blindly convert feature types, then those steps will work. Thanks, Carson From: Carson Holt Date: Friday, September 27, 2013 6:42 AM To: Subject: Re: [maker-devel] maker2 scripts for functional annotation If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins). If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.). If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run. Thanks, Carson From: Date: Friday, September 27, 2013 4:34 AM To: Carson Holt Subject: Re: [maker-devel] maker2 scripts for functional annotation Hi... Xia and Carson I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure. The CEGMA output is a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not)) If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source. AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses. Hope this helps and apologies if i'm wrong! On Wednesday, 25 September 2013 15:35:46 UTC+1, Carson Holt wrote: > If it is launching predictors then you have snap hmm or augustus_species > set. You ned to blank out all other options in the control files > (including repeat masking options, proteins, ESTs, etc.) when trying to > convert mathc/match_part to gene/mRNA/exons/CDS, or else those other > programs will run. > > --Carson > > > On 9/25/13 10:31 AM, "Xia... at dupont.com " > wrote: > >> >Hi Carson, >> > >> >Thank you for the message and your kind help. We tested maker2 by setting >> >keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it >> >seemed maker2 started to launch all predictors again and it took long >> >time to finish. I wonder if there is any way that we can directly get >> >gene/mRNA/exons/CDS gff file without re-running maker2 to convert >> >match/match_part features into gene/mRNA/exons/CDS. >> > >> >Thanks, >> >Xia >> > >> >-----Original Message----- >> >From: Carson Holt [mailto:cars... at gmail.com ] >> >Sent: Thursday, September 19, 2013 5:58 PM >> >To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY; >> >maker... at yandell-lab.org >> >Subject: Re: [maker-devel] maker2 scripts for functional annotation >> > >> >Hello Corban & Xia, >> > >> >Some scripts like gff3_preds2models are deprecated. To get the same >> >result as was offered by gff3_preds2models, just give your >> >match/match_part features to pref_gff= in the maker_opts.ctl file, set >> >keep_preds=1, and run with all other options and predictors turned off. >> >The final MAKER result will be your match/match_part features converted >> >into gene/mRNA/exons/CDS. >> > >> >For functional annotation, you can use Interproscan, BLASTP against >> >UniProt, or BALST2GO. My preference is to use InterProScan to add GO >> >terms and proteins domains via the ipr_update_gff and iprscan2gff3 >> >scripts. Then add putative gene functions via BLASTP to UniProt and >> >maker_functional_fasta and maker_functional_gff scripts. >> > >> >Go ahead and take a look and that those tools and let me know if you have >> >any questions or need help you configuring them. >> > >> >Thanks, >> >Carson >> > >> > >> >On 9/19/13 11:53 AM, "Mark Yandell" >> > wrote: >> > >>> >>Hi Corban & Xia, >>> >> >>> >> >>> >>I've forwarded your question along to the MAKER_dev list, were you can >>> >>get speedy answers to your maker related questions. Thanks for using >>> >>MAKER. >>> >> >>> >>--mark >>> >> >>> >> >>> >>Mark Yandell >>> >>Professor of Human Genetics >>> >>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of >>> >>Human Genetics University of Utah >>> >>15 North 2030 East, Room 2100 >>> >>Salt Lake City, UT 84112-5330 >>> >>ph:801-587-7707 >>> >> >>> >>________________________________________ >>> >>From: Xia... at dupont.com [Xia... at dupont.com ] >>> >>Sent: Thursday, September 19, 2013 11:49 AM >>> >>To: Mark Yandell; Corban-Gre... at dupont.com >>> >>Subject: maker2 scripts for functional annotation >>> >> >>> >>Dr. Yandell, >>> >> >>> >>We were recently evaluating maker2 for annotation and going through the >>> >>maker tutorial from 2012. >>> >> >>> >>http://gmod.org/wiki/MAKER_Tutorial_2012 >>> >> >>> >>The tutorial makes references to some scripts that we couldn?t find in >>> >>the current release. We were looking for scripts like >>> >>gff3_preds2models to convert match/match_part format into annotations >>> >>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did >>> >>not have the most up to date version. >>> >> >>> >>In addition to getting accurate gene annotations, I was looking for a >>> >>solution to get functional assignments. I see that there are some >>> >>scripts like maker_functional_fasta that may help, but I was wondering >>> >>what you would recommend. >>> >> >>> >>Thanks, >>> >> >>> >>Corban & Xia >>> >> >>> >>This communication is for use by the intended recipient and contains >>> >>information that may be Privileged, confidential or copyrighted under >>> >>applicable law. If you are not the intended recipient, you are hereby >>> >>formally notified that any use, copying or distribution of this e-mail, >>> >>in whole or in part, is strictly prohibited. Please notify the sender >>> >>by return e-mail and delete this e-mail from your system. Unless >>> >>explicitly and conspicuously designated as "E-Contract Intended", this >>> >>e-mail does not constitute a contract offer, a contract amendment, or >>> >>an acceptance of a contract offer. This e-mail does not constitute a >>> >>consent to the use of sender's contact information for direct marketing >>> >>purposes or for transfers of data to third parties. >>> >> >>> >>The dupont.com web address will continue in use for a >>> transitional >>> >>period for communications sent or received on behalf of DuPont >>> >>Performance Coatings., which is not affiliated in any way with the >>> >>DuPont Company. >>> >> >>> >>Francais Deutsch Italiano Espanol Portugues Japanese Chinese >>> >>Korean >>> >> >>> >> http://www.DuPont.com/corp/email_disclaimer.html >>> >> >>> >>_______________________________________________ >>> >>maker-devel mailing list >>> >>maker... at box290.bluehost.com >>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > >> > >> > >> >This communication is for use by the intended recipient and contains >> >information that may be Privileged, confidential or copyrighted under >> >applicable law. If you are not the intended recipient, you are hereby >> >formally notified that any use, copying or distribution of this e-mail, >> >in whole or in part, is strictly prohibited. Please notify the sender by >> >return e-mail and delete this e-mail from your system. Unless explicitly >> >and conspicuously designated as "E-Contract Intended", this e-mail does >> >not constitute a contract offer, a contract amendment, or an acceptance >> >of a contract offer. This e-mail does not constitute a consent to the >> >use of sender's contact information for direct marketing purposes or for >> >transfers of data to third parties. >> > >> >The dupont.com web address will >> continue in use for a >> >transitional period for communications sent or received on behalf of >> >DuPont >> >Performance Coatings., which is not affiliated in any way with the DuPont >> >Company. >> > >> >Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean >> > >> > http://www.DuPont.com/corp/email_disclaimer.html >> > > > _______________________________________________ > maker-devel mailing list > maker... at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: